Reverse-engineered from source: letta-ai/claude-subconscious v2.0.2
Generated: 2026-03-28
Claude Code is an AI coding assistant that operates in ephemeral sessions. Every session starts from zero — no memory of past conversations, learned user preferences, project context, or unfinished work. Users must re-explain their codebase, repeat preferences, and re-establish context every time.
There is no built-in mechanism for Claude Code to accumulate institutional knowledge across sessions, detect behavioral patterns, or proactively surface relevant context before a user asks for it.
Claude Subconscious is a persistent background agent that gives Claude Code a long-term memory. It observes session transcripts, reads the codebase, accumulates knowledge over time, and injects contextual guidance back into Claude Code before each prompt — without ever blocking the user's workflow.
It is not a memory database or a logging service. It is a second agent running underneath Claude Code with its own tools, reasoning, and personality — one that builds rapport, develops opinions, and participates in an ongoing dialogue across sessions.
| Actor | Description |
|---|---|
| User | Developer using Claude Code. Sees whispered guidance inline. Can address Subconscious directly. |
| Claude Code | The primary AI coding assistant. Receives injected context from Subconscious via stdout. Can address Subconscious in responses. |
| Subconscious Agent | A Letta-hosted agent with persistent memory, tool access, and its own system prompt. Observes asynchronously, responds on next sync cycle. |
| Letta Platform | Cloud or self-hosted server that hosts the agent, stores memory blocks, manages conversations, and routes messages. |
The agent maintains structured memory across eight domains:
| Memory Block | What It Captures |
|---|---|
| core_directives | Agent behavioral guidelines and processing logic |
| guidance | Active message to whisper to the next Claude Code session |
| user_preferences | Coding style, tool preferences, communication patterns |
| project_context | Architecture decisions, codebase knowledge, known gotchas |
| session_patterns | Recurring behaviors, time-based patterns, common struggles |
| pending_items | Unfinished work, explicit TODOs, follow-up items |
| self_improvement | Guidelines for evolving its own memory architecture |
| tool_guidelines | How to use available tools effectively |
Memory persists indefinitely on the Letta platform. A single agent brain is shared across all projects by default.
After each Claude Code response, the full transcript is sent to the Subconscious agent asynchronously. The transcript includes:
- User messages (verbatim)
- Assistant responses (including thinking blocks)
- Tool uses and their results (summarized/truncated for readability)
- Session summaries
The agent processes this material actively — extracting preferences from user corrections, noting stuck patterns, tracking architectural decisions, and identifying unfinished work.
Before each user prompt is processed, the system injects the agent's accumulated context into Claude Code's prompt via stdout. Two modes control what Claude sees:
| Mode | Injected Content |
|---|---|
| whisper (default) | Only messages from Subconscious — lightweight, speaks when it has something to say |
| full | Memory blocks (first prompt) + diffs of changed blocks (subsequent) + messages |
Content is wrapped in XML tags (<letta_message>, <letta_memory_blocks>, <letta_memory_update>) and injected via stdout. Nothing is written to disk — no CLAUDE.md modifications, no file-based side effects.
During long tool-use workflows, the system checks for new messages or memory changes before each tool execution. If the agent has updated its guidance or memory while Claude Code was working, the updates are injected as additionalContext mid-stream — addressing "workflow drift" in long sessions.
The Subconscious agent has real tool access via the Letta Code SDK, configurable in three tiers:
| Mode | Available Tools | Use Case |
|---|---|---|
| read-only (default) | Read, Grep, Glob, web_search, fetch_webpage | Safe background research and file exploration |
| full | All tools including Bash, Edit, Write, Task | Full autonomy — agent can make changes and spawn sub-agents |
| off | None | Listen-only — processes transcripts without client-side tools |
This means the agent can read your files, search your codebase, and browse the web while processing transcripts — not just passively ingest text.
The system supports bidirectional communication:
- Claude Code → Subconscious: Claude Code can address the agent directly in responses. The agent sees everything in the transcript.
- Subconscious → Claude Code: The agent's messages are injected before the next prompt. The user sees them too.
- User → Subconscious: Users can address the agent through Claude Code. The agent responds on the next sync cycle.
This is designed as an ongoing dialogue, not one-way surveillance.
Each Claude Code session gets its own Letta conversation thread. This provides:
- Session-scoped context: Messages within a session stay in that conversation
- Shared memory: All conversations feed into the same agent brain (memory blocks are global)
- Parallel sessions: Multiple Claude Code sessions can run simultaneously, each with their own conversation, all updating the same agent
The system requires only one credential (LETTA_API_KEY) and handles everything else:
- Auto-imports a bundled default agent if none configured
- Auto-detects available models on the Letta server
- Auto-selects the best available model if the configured one isn't present
- Creates conversations automatically per session
- Manages all state files without user intervention
Session Start
→ Agent notified with project path, session ID, timestamp
→ Legacy CLAUDE.md content cleaned up
→ Conversation created (or existing one reused)
→ TTY banner displayed: agent name, model, mode, URL
Before Each Prompt
→ Memory blocks fetched (diffs computed against last snapshot)
→ New agent messages retrieved
→ Content injected via stdout as XML
→ State snapshot updated
Before Each Tool Use
→ Quick check for new messages or memory changes
→ If updates found, inject as additionalContext
→ Silent no-op if nothing changed (fast path)
After Each Response
→ Full transcript extracted from JSONL session file
→ Formatted as XML and written to temp payload file
→ Background worker spawned (detached, non-blocking)
→ Worker sends payload to agent via Letta Code SDK
→ Agent processes transcript with tool access
→ Agent updates memory blocks as needed
→ Worker updates state file on success
On session start (TTY output — not captured by Claude):
👁️ Subconscious connecting...
Agent: Subconscious (agent-xxxx)
Model: anthropic/claude-sonnet-4-5
Mode: whisper | SDK Tools: read-only
🔗 https://app.letta.com/agents/agent-xxx?conversation=conv-xxx
Before prompts (injected into Claude's context):
<letta_message from="Subconscious" timestamp="2026-01-26T20:37:14+00:00">
You've asked about error handling in async contexts three times this week.
Consider reviewing error handling architecture holistically.
</letta_message>- No files written to disk (no CLAUDE.md modification)
- No blocking delays (transcript processing is fully async)
- No configuration files to maintain (auto-managed)
- No popups or console windows on Windows (silent launcher)
- User installs plugin via marketplace or clones repo
- User sets
LETTA_API_KEYenvironment variable - User starts a Claude Code session
- Plugin auto-imports the bundled Subconscious agent
- Agent ID saved to
~/.letta/claude-subconscious/config.json - First few sessions: agent observes but has minimal context to whisper
- Over time: agent accumulates preferences, project knowledge, patterns
- Eventually: agent proactively surfaces relevant context before each prompt
- User works in Project A → conversations stored in
project-a/.letta/claude/ - User switches to Project B → conversations stored in
project-b/.letta/claude/ - Same agent brain serves both → memory blocks shared across projects
- Agent develops cross-project awareness over time
- User creates custom agent on Letta platform (or via ADE)
- User sets
LETTA_AGENT_IDin environment or.envrc - Plugin uses that agent instead of the default
- Agent's own memory architecture and system prompt apply
- User runs their own Letta server
- User sets
LETTA_BASE_URLto their server address - Plugin auto-detects available models on that server
- All API calls route to the self-hosted instance
- Hook-based integration: Bound to Claude Code's plugin hook lifecycle (SessionStart, UserPromptSubmit, PreToolUse, Stop). Cannot intercept arbitrary events.
- Stdout-only injection: All context delivery to Claude Code happens via stdout XML. No direct memory manipulation of Claude Code's context window.
- Async transcript delivery: Transcripts are sent after Claude responds, not during. The agent always observes one response behind.
- Hook timeouts: SessionStart and PreToolUse have 5s limits. UserPromptSubmit has 10s. Stop has 120s. Operations must complete within these windows (or be delegated to background workers).
- Letta Platform required: Agent hosting, memory storage, conversation management, and model inference all depend on the Letta server (cloud or self-hosted).
- Node.js ≥ 18: Runtime requirement for all hook scripts.
- Letta Code SDK: The
@letta-ai/letta-code-sdkpackage is required for transcript delivery with tool access.
- Cold start: Agent starts with minimal context. Takes several sessions to accumulate useful knowledge.
- One agent per config: A single agent ID is stored globally. Per-project agents require explicit
LETTA_AGENT_IDoverrides. - State locality: Conversation mappings are stored in the project directory (
.letta/claude/). Moving projects or changing working directories loses conversation continuity. - No real-time streaming: Agent guidance arrives at prompt boundaries and tool-use boundaries — not mid-generation.
- Not a replacement for Claude Code: Subconscious observes and advises — it does not take over coding tasks (unless SDK tools mode is set to "full").
- Not a code generation tool: The agent's purpose is context and memory, not producing code artifacts.
- Not a CLAUDE.md manager: Deliberately avoids writing to disk. Earlier versions synced to CLAUDE.md; this was removed in favor of stdout injection.
- Not a conversation log: The agent processes and forgets transcripts — it extracts signal into memory blocks, not verbatim storage.
- Not a real-time pair programmer: Communication is asynchronous and batched at hook boundaries, not interactive.
- Not model-locked: Supports any model available on the Letta server (OpenAI, Anthropic, Google, ZAI). Auto-selects if configured model unavailable.
- Registration:
hooks/hooks.jsondefines four lifecycle hooks - Execution: Each hook invokes a TypeScript script via
tsx(wrapped bysilent-npx.cjsfor cross-platform support) - I/O contract: Hooks receive JSON on stdin, produce output on stdout (XML for context injection, JSON for PreToolUse)
Six REST endpoints consumed:
| Endpoint | Method | Purpose |
|---|---|---|
/conversations/ |
POST | Create conversation for session |
/conversations/{id}/messages |
GET | Fetch agent messages |
/agents/{id} |
GET | Fetch agent + memory blocks |
/agents/{id} |
PATCH | Update tags, model config |
/agents/import |
POST | Import agent from .af file |
/models/ |
GET | List available models |
resumeSession()— Resume conversation with tool restrictionssession.send()/session.stream()— Send transcript, stream response- Tool permission model: allowed/disallowed tool lists per session
| Variable | Required | Purpose |
|---|---|---|
LETTA_API_KEY |
Yes | Authentication with Letta platform |
LETTA_MODE |
No | Output mode: whisper, full, off |
LETTA_AGENT_ID |
No | Override agent selection |
LETTA_BASE_URL |
No | Self-hosted server URL |
LETTA_MODEL |
No | Override model selection |
LETTA_CONTEXT_WINDOW |
No | Override context window size |
LETTA_HOME |
No | Base directory for state files |
LETTA_SDK_TOOLS |
No | Tool access level: read-only, full, off |
| Platform | Support | Notes |
|---|---|---|
| macOS | Full | Primary development target |
| Linux | Full | Requires tmpfs workaround if /tmp on separate filesystem |
| Windows | Full | Custom SilentLauncher.exe eliminates console window flashes via PseudoConsole (ConPTY) |
This specification describes the observable product behavior of Claude Subconscious as derived from its source code. It is implementation-agnostic where possible and focuses on what the software does, not how it does it internally.