Clean-sheet redesign of Claude Code session management for the Baymax agent platform. Replaces the custom tmux bridge and trigger daemon with the Claude Agent SDK + Bus integration.
Lab → Bus (skill) → Trigger Daemon (port 3099) → claude -p subprocess
↓ events
Bus (passive listener) → Lab callback
Three services in the critical path. Trigger daemon holds canonical state in memory (lost on restart). Bus is passive/reactive. Lab polls for status. The trigger daemon is a 600-line mjs file doing subprocess management, HTTP API, SSE streaming, worktree management, and session persistence — all responsibilities that should be separated.
"Remote Control" (claude remote-control) is a UI feature for controlling local CLI from claude.ai/mobile — not a programmatic API. The actual programmatic interface is the Claude Agent SDK (@anthropic-ai/claude-agent-sdk):
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Find and fix the bug in auth.py",
options: {
allowedTools: ["Read", "Edit", "Bash"],
maxBudgetUsd: 5.00,
maxTurns: 50,
hooks: { Stop: (data) => { /* ... */ } },
}
})) {
// Streaming messages: init (session_id), tool_use, result
}The SDK handles session lifecycle, tool permissions, budget limits, and structured output natively — eliminating the need for subprocess management, terminal scraping, and custom watchdogs.
Embed the session manager in the Bus. The Bus already has persistent session registry (SQLite), event log with replay, WebSocket pub/sub, and durable workflows (OpenWorkflow).
Any caller (Lab, Hank, API)
→ Bus API: POST /sessions/spawn
→ Bus Session Manager (Agent SDK query())
→ streams events to Bus WebSocket
→ persists state to Bus SQLite
→ publishes session.* events
→ caller subscribes to events via WebSocket
import { query } from "@anthropic-ai/claude-agent-sdk";
class SessionManager {
// Spawn: creates session record, starts query(), streams results
async spawn(config: SessionConfig): Promise<Session>
// Send: resumes an idle session with a follow-up
async send(sessionId: string, message: string): Promise<void>
// Kill: aborts the running query, marks session failed
async kill(sessionId: string): Promise<void>
// List: reads from SQLite (survives restarts)
async list(filter?: SessionFilter): Promise<Session[]>
}interface SessionConfig {
prompt: string;
workDir: string;
// Identity
name?: string; // human-readable session name
owner?: string; // agent or user who spawned it
// Linkage
runIdentifier?: string; // Lab run (RUN-*) for completion callback
parentSessionId?: string; // for session chains
// Limits
maxBudgetUsd?: number; // default: 5.00
maxTurns?: number; // default: 50
idleTimeoutMs?: number; // default: 120_000
// Agent SDK options
allowedTools?: string[];
permissionMode?: string;
claudeMdContent?: string; // injected as project context
// Callback routing
notify?: {
channel: "telegram" | "slack" | "bus";
target: string; // chat ID, channel ID, or bus topic
};
}| Event | When |
|---|---|
session.created |
Config validated, queued |
session.spawned |
Agent SDK query() started |
session.output |
Streaming chunks (opt-in subscription) |
session.tool_use |
Tool call started (name, args) |
session.tool_done |
Tool call completed (result summary) |
session.idle |
No activity for idleTimeoutMs |
session.resumed |
Follow-up message sent |
session.completed |
query() finished, result captured |
session.failed |
Error or budget exceeded |
session.killed |
Manually terminated |
Extend the existing Bus sessions table:
ALTER TABLE sessions ADD COLUMN agent_session_id TEXT; -- Agent SDK session ID for resume
ALTER TABLE sessions ADD COLUMN config JSON; -- full SessionConfig
ALTER TABLE sessions ADD COLUMN result TEXT; -- final output
ALTER TABLE sessions ADD COLUMN cost_usd REAL; -- actual cost
ALTER TABLE sessions ADD COLUMN turns INTEGER; -- actual turns used
ALTER TABLE sessions ADD COLUMN error TEXT; -- error message if failedconst LIMITS = {
maxConcurrent: 4, // total active sessions
maxPerOwner: 2, // per agent/user
queueTimeout: 300_000, // 5 min queue wait before failing
};
// When at capacity: queue with position, publish session.queued event
// When slot opens: dequeue next, publish session.spawnedmanager.onComplete((session) => {
// If linked to a Lab run, complete the stage
if (session.config.runIdentifier) {
await labApi.completeStage(session.config.runIdentifier, {
summary: session.result,
cost: session.costUsd,
});
}
// Notify via configured channel
if (session.config.notify) {
await bus.publish(`notification.${session.config.notify.channel}`, {
target: session.config.notify.target,
message: `Session ${session.name} completed`,
});
}
});| Current | New |
|---|---|
| Trigger daemon (task-trigger.mjs, port 3099) | Gone — absorbed into Bus |
| tmux / claude -p subprocess management | Agent SDK query() handles it |
| sessions.json file persistence | Bus SQLite (already exists) |
| Lab sessions API (proxy to trigger) | Direct Bus API |
| session-lifecycle.ts (passive handler) | SessionManager completion hooks |
| implement-run skill (spawn wrapper) | SessionManager.spawn() directly |
| Worktree management in trigger daemon | Agent SDK spawn: "worktree" option or pre-spawn hook |
Hank (builder agent) spawning a session:
const session = await bus.post("/sessions/spawn", {
prompt: "Implement the auth refactor per the plan...",
workDir: "/home/sumit/projects/archie-core",
runIdentifier: "RUN-5",
owner: "hank",
maxBudgetUsd: 3.00,
notify: { channel: "telegram", target: "sumitngupta" },
});Lab pipeline:
await bus.post("/sessions/spawn", {
prompt: buildImplementPrompt(run, artifacts),
workDir: resolveWorkDir(run),
runIdentifier: run.identifier,
owner: "lab",
});Monitoring (any WebSocket subscriber):
bus.subscribe("session.*", (event) => {
// Real-time session lifecycle — Mission Control, Lab UI, CLI all get same stream
});- Add Agent SDK to Bus —
npm install @anthropic-ai/claude-agent-sdk - Build SessionManager class in Bus with spawn/send/kill/list
- Add Bus HTTP endpoints —
POST /sessions/spawn,GET /sessions, etc. - Wire Hank's tools to call Bus instead of trigger daemon
- Wire Lab implement handler to call Bus SessionManager
- Move completion hooks from session-lifecycle.ts into SessionManager
- Deprecate trigger daemon — stop the systemd service
- Clean up — remove archie-core claude-code tools that proxy to trigger daemon
One service owns session lifecycle end-to-end (Bus). State survives restarts (SQLite). Any service can subscribe to events (WebSocket). The Agent SDK handles the actual Claude interaction properly instead of subprocess management.
Baymax Agent Platform — 2026-03-28