- Users messaging OpenClaw bots via Telegram/Slack hit stopped sandboxes — current proxy returns HTML waiting pages, which webhooks can't use
- URL stability is already solved: stable subdomain proxy (
{key}.basedomain.com) survives sandbox restarts - Build dedicated
/api/webhooks/telegram(and/slack) endpoints that return200 OKimmediately and queue messages in Redis - Send "booting up..." progress messages via platform API, editing them every ~30s until sandbox is ready
- Call OpenClaw directly via
sandbox.domain(3000)+ gateway token — skip the proxy and password gate entirely - Add a cron-based messaging pump to send progress updates and drain queues
- Extend idle timeout from 10min to 30-60min for sandboxes with active conversations
- Open question: OpenClaw has
--skip-channelsflag — should we enable built-in Telegram/Slack support instead of building our own adapter?
When users message our OpenClaw bots via Telegram/Slack/etc., the sandbox is often stopped. The current proxy returns HTML waiting pages — useless for webhook senders that expect a fast 200 OK.
Our stable subdomain proxy ({key}.basedomain.com) survives sandbox restarts. Raw sandbox hostnames are ephemeral, but the proxy re-resolves them via Redis. Webhook URLs registered with Telegram will keep working across restarts.
POST /api/webhooks/telegram (and /slack, etc.)
- Lives under
/api/*soproxy.tsnever touches it - Parses
sandboxKeyfromHostheader via existingparseSubdomain - Validates webhook (Telegram secret token, Slack signature)
- Deduplicates retries with
SET NX EXin Redis - Enqueues message in Redis LIST, triggers restore via
after() - Sends "booting up..." message via platform API
- Returns
200 OKimmediately
Per-conversation LIST + dedup keys:
msgq:{platform}:{sandboxKey}:{conversationId}— message queuemsgq:dedup:{platform}:{eventId}— retry dedupmsgq:drain-lock:{platform}:{sandboxKey}:{conversationId}— drain lockmsgq:warm:{platform}:{sandboxKey}:{conversationId}— warmup job state
Atomic batch drain via EVAL (LRANGE + LTRIM).
Extract the proxy route's restore-from-snapshot into src/server/sandboxes/restore-runtime.ts. Reuse from:
- Proxy route (browser traffic)
- Webhook routes (messaging)
- Admin restore route (fixes inconsistency where admin restore currently skips
.on-restore.sh)
Send "booting up..." via platform API, edit the same message with updates:
- T=0: "Starting up..."
- T=30s: "Still starting..."
- T=60-90s: "Still starting; I'll respond when ready."
- T=180s: "Still not ready; I'll keep trying."
- When ready: Drain queue, send actual responses
/api/cron/messaging-pump — same locking pattern as existing check-sandboxes:
- Send progress updates for sandboxes still restoring
- Drain queues for sandboxes that are now running
- Expire stale warmup jobs and notify users
Extend idle threshold from 10min → 30-60min for sandboxes with active messaging conversations. Reduces unnecessary cold starts during normal chat pauses.
Skip the proxy for messaging. Load sandbox meta from Redis, then:
const sandbox = await Sandbox.get({ sandboxId: meta.sandboxId })
const baseUrl = sandbox.domain(3000)
fetch(baseUrl + '/api/...', {
headers: { Authorization: `Bearer ${meta.gatewayToken}` }
})
This avoids the password gate entirely and keeps OpenClaw private (only our server knows the token).
OpenClaw setup currently passes --skip-channels. Does OpenClaw have built-in Telegram/Slack channel support we should enable instead of building our own adapter layer?