Messaging Cold-Start UX Plan

tl;dr

Users messaging OpenClaw bots via Telegram/Slack hit stopped sandboxes — current proxy returns HTML waiting pages, which webhooks can't use
URL stability is already solved: stable subdomain proxy ({key}.basedomain.com) survives sandbox restarts
Build dedicated /api/webhooks/telegram (and /slack) endpoints that return 200 OK immediately and queue messages in Redis
Send "booting up..." progress messages via platform API, editing them every ~30s until sandbox is ready
Call OpenClaw directly via sandbox.domain(3000) + gateway token — skip the proxy and password gate entirely
Add a cron-based messaging pump to send progress updates and drain queues
Extend idle timeout from 10min to 30-60min for sandboxes with active conversations
Open question: OpenClaw has --skip-channels flag — should we enable built-in Telegram/Slack support instead of building our own adapter?

Problem

When users message our OpenClaw bots via Telegram/Slack/etc., the sandbox is often stopped. The current proxy returns HTML waiting pages — useless for webhook senders that expect a fast 200 OK.

Key Insight: URL Stability Is Already Solved

Our stable subdomain proxy ({key}.basedomain.com) survives sandbox restarts. Raw sandbox hostnames are ephemeral, but the proxy re-resolves them via Redis. Webhook URLs registered with Telegram will keep working across restarts.

What We Need to Build

1. Dedicated webhook endpoints

POST /api/webhooks/telegram (and /slack, etc.)

Lives under /api/* so proxy.ts never touches it
Parses sandboxKey from Host header via existing parseSubdomain
Validates webhook (Telegram secret token, Slack signature)
Deduplicates retries with SET NX EX in Redis
Enqueues message in Redis LIST, triggers restore via after()
Sends "booting up..." message via platform API
Returns 200 OK immediately

2. Redis message queue

Per-conversation LIST + dedup keys:

msgq:{platform}:{sandboxKey}:{conversationId} — message queue
msgq:dedup:{platform}:{eventId} — retry dedup
msgq:drain-lock:{platform}:{sandboxKey}:{conversationId} — drain lock
msgq:warm:{platform}:{sandboxKey}:{conversationId} — warmup job state

Atomic batch drain via EVAL (LRANGE + LTRIM).

3. Shared restore logic

Extract the proxy route's restore-from-snapshot into src/server/sandboxes/restore-runtime.ts. Reuse from:

Proxy route (browser traffic)
Webhook routes (messaging)
Admin restore route (fixes inconsistency where admin restore currently skips .on-restore.sh)

4. Platform progress messages

Send "booting up..." via platform API, edit the same message with updates:

T=0: "Starting up..."
T=30s: "Still starting..."
T=60-90s: "Still starting; I'll respond when ready."
T=180s: "Still not ready; I'll keep trying."
When ready: Drain queue, send actual responses

5. Cron-based messaging pump

/api/cron/messaging-pump — same locking pattern as existing check-sandboxes:

Send progress updates for sandboxes still restoring
Drain queues for sandboxes that are now running
Expire stale warmup jobs and notify users

6. Smarter keepalive

Extend idle threshold from 10min → 30-60min for sandboxes with active messaging conversations. Reduces unnecessary cold starts during normal chat pauses.

Architecture Decision: Call OpenClaw Directly

Skip the proxy for messaging. Load sandbox meta from Redis, then:

const sandbox = await Sandbox.get({ sandboxId: meta.sandboxId })
const baseUrl = sandbox.domain(3000)
fetch(baseUrl + '/api/...', {
  headers: { Authorization: `Bearer ${meta.gatewayToken}` }
})

This avoids the password gate entirely and keeps OpenClaw private (only our server knows the token).

Open Question

OpenClaw setup currently passes --skip-channels. Does OpenClaw have built-in Telegram/Slack channel support we should enable instead of building our own adapter layer?

johnlindquist/messaging-coldstart-ux-plan.md

Select an option

No results found