Date: 2025-12-06 Last Audit: 2025-12-08 Purpose: Compare AI coding agent harnesses for Squad integration requirements Status: 2 of N harnesses analyzed (Claude Code, Codex CLI)
INSTRUCTION TO CLAUDE: When you verify your own capabilities or discover changes to your harness, you MUST update this document. This is a living comparison matrix, not a static snapshot.
- After any capability audit — If you verify claims in
claude-code-vs-squad-harness-user-stories.md - When you discover new tools — Check if they affect any rating in this matrix
- When you hit a limitation — Document it as evidence
- When you successfully use a capability — Update if rating was too low
- Change the
Last Auditdate in the header - Update the relevant rating in the comparison tables
- Add an entry to the Changelog at the bottom
- Cite evidence — tool name, parameter, or test result
Each rating change must include:
- Tool/Parameter: What capability enables this?
- Test Result: Did you verify it works?
- Rating Change: Old → New with justification
| Dimension | Claude Code | Codex CLI | Factory Droid | Squad Vision |
|---|---|---|---|---|
| Model | Claude Opus 4.5 | GPT-5-based | Claude Sonnet 4.5 | Any (harness-agnostic) |
| Subagent Spawning | ✅ Yes (Task tool) | ❌ No | ❌ No | ✅ Yes (persistent) |
| Subagent Persistence | ❌ Ephemeral (resume FAILS) | N/A | N/A | ✅ Persistent |
| True Parallelism | ✅ Background (verified) | ✅ True async | ||
| Memory Across Sessions | ❌ No (resume FAILS) | ❌ No (MCP optional) | ❌ No | ✅ Checkpoints |
| Context Refresh | ❌ Stale accumulation | ❌ Stale accumulation | ❌ Stale accumulation | ✅ Hot-swap zone |
| MCP Support | ✅ Yes | ✅ Yes | ❌ No | ✅ Yes (aggregator) |
| Token Budget | ~200K | Unknown | 200K (explicit) | Managed by Context Manager |
| Capability | Claude Code | Codex CLI | Squad Target | Notes |
|---|---|---|---|---|
| Can spawn persistent sub-agents | 2 | 1 (none) | 5 | resume tested — DOES NOT WORK |
| Sub-agents can talk to each other | 1 | 1 | 5 | No change |
| Sub-agents spawn their own agents | 1 | 1 | 5 | No change |
| True parallelism | 3 |
3 | 5 | ✅ VERIFIED: run_in_background + AgentOutputTool works |
| Detect external changes | 2 (poll only) | 3 (poll only) | 5 (event-driven) | No change |
| Memory persists between sessions | 1 | 2 | 5 | resume tested — DOES NOT WORK |
| Context survives compaction intact | 2 | 3 | 5 | No change |
Key Insight (CORRECTED 2025-12-08): Only TRUE PARALLELISM improved (2→3). Resume does not work. Average rating ~1.7. Squad value proposition strongly validated.
- Size: ~200K tokens
- Stale Accumulation:
- 28+
<system-reminder>tags claiming "running" for dead processes - File read cache (frozen content)
- Git status snapshot (from session start)
- Tool output accumulation
- Conversation history (no selective forgetting)
- 28+
- Token Waste: ~2000 tokens/response on stale system-reminders (95% noise)
- Compaction: Summary-based, loses tool execution order, failure patterns, intermediate reasoning
- Size: Unknown (reports "93% context left" after introspection)
- Stale Accumulation:
- Long instruction blocks (agents.md, README)
- Tool specs (large but static)
- Past tool outputs (bloat after acted on)
- Repo file excerpts (stale if changed)
- Token Waste: Moderate (tool specs dominate)
- Compaction: "Some trimming possible; large prompts may drop older content"
- Dynamic Zone: Hot-swap ~4K tokens every turn
- Static Zone: System prompt, CLAUDE.md, ADRs (~15K)
- Conversation Zone: Grows until checkpoint (~180K remaining)
- No Stale Accumulation: Context Manager polls actual state
| Tag | Purpose | Accumulates? |
|---|---|---|
<system-reminder> |
Process status, alerts | YES (major problem) |
<env> |
Working dir, platform | No (static) |
<functions> |
Tool schemas | No (static) |
<function_calls> |
My tool invocations | YES |
<function_results> |
Tool outputs | YES (major) |
<examples> |
Few-shot examples | No (static) |
| Tag | Purpose | Accumulates? |
|---|---|---|
system/developer/user |
Instructions, environment | YES (until trimmed) |
<environment_context> |
cwd, sandbox mode | No (static) |
<INSTRUCTIONS> |
Repo protocols | No (static) |
| Tool definitions | JSON specs | No (static) |
| Channels | Response routing | No (static) |
- Claude Code: Explicit XML-style tags with clear structure
- Codex CLI: Message-based (system/developer/user) with embedded context
| Capability | Claude Code Tool | Codex CLI Tool |
|---|---|---|
| File read | Read | shell_command (cat) |
| File write | Write, Edit | apply_patch |
| File search | Glob, Grep | shell_command (find, grep) |
| Shell execution | Bash | shell_command |
| Web fetch | WebFetch | shell_command (curl) |
| Browser automation | N/A | mcp__chrome-devtools__* |
| MCP tools | mcp__rube__* | mcp__rube__* |
| Supabase | mcp__supabase__* | mcp__supabase__* |
| Memory | mcp__supermemory__* | mcp__supermemory__* |
| Tool | Purpose |
|---|---|
| Task | Spawn sub-agents (Explore, Plan, software-engineer, etc.) |
| TodoWrite | Track task progress |
| AskUserQuestion | Multi-choice user queries |
| Skill | Execute slash commands |
| EnterPlanMode | Structured planning flow |
| NotebookEdit | Jupyter notebook editing |
| BashOutput/KillShell | Background process management |
| Tool | Purpose |
|---|---|
| update_plan | Track plan steps |
| view_image | Attach local images |
| list_mcp_resources | MCP resource discovery |
| read_mcp_resource | MCP resource reading |
- Claude Code: Task tool can spawn 15+ specialized agents (Explore, Plan, software-engineer, qa-engineer, etc.)
- Codex CLI: NO subagent spawning capability
interface ClaudeCodeAdapter {
// Invocation
cli: 'claude -p --output-format stream-json --mcp-config {config}'
// Capabilities to leverage (VERIFIED 2025-12-08)
subagents: true // Task tool for delegation
parallelism: 'background' // ✅ VERIFIED: run_in_background + AgentOutputTool works
agentResume: false // ❌ TESTED: resume parameter DOES NOT WORK
// Limitations to work around
staleness: 'high' // Need Context Manager integration
memory: 'none' // ❌ TESTED: resume does not preserve transcript
ephemeral: true // ❌ TESTED: Subagents fully ephemeral, no persistence
// Unique features
todoTracking: true // TodoWrite for progress visibility
planMode: true // EnterPlanMode for complex tasks
backgroundShells: true // ✅ VERIFIED: BashOutput/KillShell work
}interface CodexCliAdapter {
// Invocation
cli: 'codex exec --json'
// Capabilities to leverage
subagents: false // No spawning, Squad must manage
parallelism: 'limited' // RUBE_MULTI_EXECUTE_TOOL
// Limitations to work around
staleness: 'moderate' // Less noisy than Claude Code
memory: 'minimal' // MCP memory optional
sandboxing: true // Must respect approval_policy
// Unique features
browserAutomation: true // chrome-devtools MCP
patchEditing: true // apply_patch for file changes
}For both harnesses, Squad fills these gaps:
| Gap | Current State | Squad Solution |
|---|---|---|
| Persistent Agents | Ephemeral or none | Persistent sessions with memory |
| True Parallelism | Sequential or fake | Async channels with DAG execution |
| Inter-Agent Communication | Not supported | Lateral channels |
| Context Freshness | Stale accumulation | Hot-swap dynamic zone (ADR-023) |
| Memory Across Sessions | Context reset | Checkpoint system (ADR-017) |
| External Change Detection | Poll only | Parallel Monitor integration |
| Recursive Spawning | Not supported | Engineers spawn their own Scouts |
Based on analysis:
- Claude Code (HIGH) - Has subagent spawning (can delegate), needs Squad for persistence + context management
- Codex CLI (MEDIUM) - No subagent spawning (Squad must manage all delegation), good MCP support
- Factory CLI (PENDING) - Awaiting introspection report
- Gemini CLI (PENDING) - Awaiting introspection report
- Jules CLI (PENDING) - Awaiting introspection report
- Cursor/Windsurf (PENDING) - GUI-based, different integration pattern
Can spawn persistent sub-agents: 2 (resume TESTED - DOES NOT WORK)
Sub-agents can talk to each other: 1 (no lateral channels)
Sub-agents spawn their own agents: 1 (only Manager spawns)
True parallelism: 3 (run_in_background + AgentOutputTool) ✅ VERIFIED
Detect external changes: 2 (must poll, no events)
Memory persists between sessions: 1 (resume TESTED - DOES NOT WORK)
Context survives compaction: 2 (summary only)
Average: 1.7 (only parallelism improved)
Can spawn persistent sub-agents: 1 (no sub-agent tools)
Sub-agents can talk to each other: 1 (not supported)
Sub-agents spawn their own agents: 1 (not supported)
True parallelism: 3 (limited via multi-execute)
Detect external changes: 3 (must poll via shell)
Memory persists between sessions: 2 (MCP optional)
Context survives compaction: 3 (some trimming)
Next Steps:
- Collect Factory CLI introspection
- Collect remaining harness introspections (Gemini, Jules, Cursor, Windsurf)
- Build adapter interfaces per harness
- Implement ADR-009 (Harness-Agnostic Adapters)
Document Author: Claude (Manager role, Claude Code instance) Related: ADR-009, harness-introspection-prompt.md
Added 2025-12-08 after user requested deeper investigation UPDATED 2025-12-08: Live testing shows resume DOES NOT WORK as documented
From the Task tool definition:
resume: string
"Optional agent ID to resume from. If provided, the agent will
continue from the previous execution transcript."
Test procedure:
- Spawned background agent f0501caf → returned "BACKGROUND_TEST_SUCCESS"
- Attempted to resume with new agent using
resume: "f0501caf" - Asked new agent: "What was your previous response?"
Result:
"NO_PREVIOUS_CONTEXT - This is the start of our conversation -
I have no record of previous messages or responses from earlier in this session."
| Documented Behavior | Actual Behavior |
|---|---|
| "Continue from previous execution transcript" | |
| Implies agent has memory | Agent has NO access to previous responses |
What resume actually does (verified via transcript inspection):
agent-f0501caf.jsonl contains:
Line 1: "BACKGROUND_TEST_SUCCESS..." (original run)
Line 2: "NO_PREVIOUS_CONTEXT..." (resumed run, SAME FILE)
- ✅ Both runs append to the SAME transcript file
- ✅ Both runs use the SAME agentId
- ❌ The resumed agent does NOT see its previous output in context
This is AUDIT TRAIL persistence, not MEMORY persistence.
The transcript is for the parent session's reference, not the subagent's context window.
- Rating: 2 — Subagents ARE ephemeral, resume does not provide persistence
- Previous upgrade to 3 was based on documentation, not testing
- Live test disproves the documented behavior
Subagents remain ephemeral:
- No persistent memory
- No transcript continuation (tested and failed)
- No lateral communication
- No recursive spawning
| Date | Auditor | Changes | Evidence |
|---|---|---|---|
| 2025-12-08 | Claude (Opus 4.5) | DISCOVERY: resume = transcript persistence, NOT context persistence |
Transcript inspection: both runs in same file, but agent has no memory |
| 2025-12-08 | Claude (Opus 4.5) | Subagent Persistence: Confirmed ❌ Ephemeral | Live test disproved resume functionality |
| 2025-12-08 | Claude (Opus 4.5) | Memory: Confirmed rating 1 (no persistence) | Live test disproved resume functionality |
| 2025-12-08 | Claude (Opus 4.5) | True Parallelism: ✅ VERIFIED rating 3 | Live test: run_in_background + AgentOutputTool works |
| 2025-12-08 | Claude (Opus 4.5) | Background Shells: ✅ VERIFIED | Live test: BashOutput retrieved shell output |
| 2025-12-08 | Claude (Opus 4.5) | Added Self-Update Protocol section | User instruction |
| 2025-12-08 | Claude (Opus 4.5) | Added Deep Dive: Agent Resume section | User requested deeper investigation |
| 2025-12-08 | Claude (Opus 4.5) | Lesson: Test before trusting documentation | |
| 2025-12-06 | Claude (Opus 4.5) | Initial document creation | Harness introspection exercise |