Skip to content

Instantly share code, notes, and snippets.

@usirin
Last active April 12, 2026 21:04
Show Gist options
  • Select an option

  • Save usirin/fcc883e192c1b065ffa3f1eecc9cb226 to your computer and use it in GitHub Desktop.

Select an option

Save usirin/fcc883e192c1b065ffa3f1eecc9cb226 to your computer and use it in GitHub Desktop.
Research: Claude Code Parallel Execution Primitives for Operator Agent Enhancement
title Parallel Execution Primitives for Claude Code Operator Agents
date 2026-04-12
project kampus
feature claude-code-parallel-primitives
type research
status complete

Parallel Execution Primitives for Claude Code Operator Agents

Executive Summary

Claude Code has exactly three primitives for parallel execution -- subagents, git worktrees, and Agent Teams -- and the most important thing about all three is what they do not provide: any implicit safety net for concurrent writes. Parallel subagents sharing a working directory produce silent file corruption. No warnings. No conflict markers. No errors. Last write wins at the OS level, and the Edit tool's string replacement fails unpredictably when line counts shift between agents. The git index lock (fatal: Unable to create '.git/index.lock': File exists) fires even when agents write to completely disjoint files. Claude Code will not protect you. The operator must enforce isolation boundaries, or the operator ships corruption.

This is the central finding and it has a direct corollary: the only sanctioned patterns for parallel code generation are strict file partitioning (disjoint file ownership per agent) or worktree isolation (each agent gets its own branch and working directory). Read-only parallelism is safe without either. Everything else is a race condition waiting to surface.

For the operator specifically, the lowest-risk path is subagent fan-out with worktree isolation for independent tasks, where the state machine's dependency graph determines what can run concurrently. This preserves every guarantee the operator already makes -- circuit breaker, retry logic, state machine authority -- while cutting wall-clock time proportional to the parallelism factor. Agent Teams, the peer-to-peer coordination layer shipped in February 2026 behind an experimental flag, is architecturally interesting and operationally fragile: no session resumption, task status synchronization bugs, delegation compliance failures, and 20 documented issues across 10 official and 10 community-discovered reports. Anthropic's launch of Managed Agents (April 2026, public beta) as the production-grade multi-agent offering suggests Agent Teams may remain a power-user local feature indefinitely.

Token costs scale N-times for N parallel agents, compounded by prompt cache misses -- parallel agents cold-start independently rather than sharing cache. Five parallel subagents each processing 100K tokens of shared system prompt pay $3.125 versus $0.825 for sequential execution on that prefix alone. That is a 3.8x penalty on cache alone, before counting anything else. Model tiering -- Haiku for exploration, Sonnet for implementation, Opus only for orchestration -- is the highest-leverage cost optimization, yielding 40-50% savings. The 20x cost premium documented in Anthropic's harness design research comes from longer autonomous sessions with evaluation loops, not from parallelism itself; adding fan-out to the harness pattern roughly 1.5-2x the sequential harness cost.

The operator's hybrid architecture -- deterministic state machine core plus agentic skill execution shell -- is exactly the pattern the industry has converged on for production multi-agent systems. David Fetterman named it "deterministic core, agentic shell." Stately AI ships XState bindings for it. Every serious production deployment in 2025-2026 uses some version of it. Parallelism is not an architectural change for the operator. It is a scheduling optimization within the existing architecture. The state machine continues to own task ordering and transitions; the operator gains the ability to dispatch multiple independent transitions simultaneously.


1. Agent Teams Architecture

Agent Teams shipped in February 2026 as a research preview, gated behind CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1. One Claude Code session (the team lead) spawns independent teammate agents, each with its own context window and full tool access. Four components: a team lead, teammates, a shared task list (JSON files at ~/.claude/tasks/{team-name}/), and a per-agent mailbox system (JSON inboxes at ~/.claude/teams/{team-name}/inboxes/).

What makes Agent Teams fundamentally different from subagents is not scale but topology. Subagents are spokes reporting to a hub. Agent Teams are a mesh.

Peer-to-peer messaging -- teammates communicate directly, not just through the parent. This enables adversarial debugging, collaborative research, and self-organizing patterns that subagents structurally cannot support. Shared task list with dependency tracking -- tasks have pending/in-progress/completed states, file-locking for claims, and automatic unblocking when dependencies complete. Persistent teammates -- each teammate is a full Claude Code session that persists for the team's duration, unlike ephemeral subagent invocations. Quality gates via hooks -- TeammateIdle, TaskCreated, TaskCompleted hooks enable programmatic governance (e.g., requiring tests to pass before task closure). Plan approval workflow -- teammates can be required to plan before implementing, with the lead reviewing and approving plans.

The stability picture is honest and unflattering. Agent Teams works well for parallel research, independent module development, and competing-hypothesis debugging. It works badly for anything requiring reliability. Twenty documented issues span session management fragility, VS Code integration breakage, tmux race conditions, delegation compliance failures, and task status synchronization bugs. The feature remains behind an experimental flag with no announced GA timeline. The recommended sweet spot from community practice is 3-5 teammates with 5-6 tasks each.

Cost profile: A 3-agent team uses 3-4x tokens of a single session. Plan approval phases push this to ~7x. The cost is not the problem. The reliability is.


2. Worktree Isolation Mechanics

Git worktrees are how you make parallel code generation safe. They create independent working directories with their own branch, index, and files while sharing the same repository history. Everything else -- file partitioning, careful prompting, hoping agents stay in their lane -- is a brittle approximation of what worktrees give you by construction.

Three entry points:

Method Use Case
claude --worktree <name> User sessions
isolation: worktree in subagent frontmatter Automated parallel code generation
EnterWorktree tool Mid-session isolation

Worktrees are created at <repo>/.claude/worktrees/<name>/ on a branch named worktree-<name>, branching from origin/HEAD. Base branch selection is not configurable via flag -- requires git remote set-head or a WorktreeCreate hook. This is a real limitation the operator must work around.

Merge semantics: There is no automatic merge. This is a feature, not a missing feature. Worktree branches must be integrated manually via git merge, git cherry-pick, gh pr create, or by asking Claude in the main session. Conflicts surface only at merge time and use standard git conflict resolution. The team lead (or operator) coordinates sequential merging.

Cleanup: Worktrees with no changes are auto-removed when the subagent finishes. Changed worktrees persist for manual review. Orphaned worktrees are cleaned up at startup after 30 days (configurable) if they have no uncommitted changes, untracked files, or unpushed commits.

Monorepo considerations: Worktrees need dependency installation since node_modules is absent. worktree.symlinkDirectories symlinks specified directories from the main repo. worktree.sparsePaths uses git sparse-checkout for large monorepos. Known bug: Claude Code's atomic write pattern replaces symlinks with regular files -- tracked in issue #40857.

The constraint operators must internalize: Subagent results return as text summaries, not diffs or commits. The parent must explicitly interact with the worktree's branch to integrate changes. The operator needs explicit merge logic after parallel subagent completion. There is no shortcut here.


3. Parallel Write Safety

Parallel subagents in Claude Code are genuinely concurrent -- they fire simultaneously, not sequentially. Without isolation, they share the same working directory. This is where every naive "just run N agents" approach breaks.

Three findings, in order of severity:

1. Concurrent writes to the same file produce silent corruption. Last-write-wins at the OS level. The Edit tool's string replacement fails unpredictably when line counts change between agents. No warnings, no errors, no conflict markers. The file is simply wrong, and nobody tells you.

2. Git index lock contention causes commit failures. Git's .git/index.lock mutex means concurrent git add or git commit calls fail with fatal: Unable to create '.git/index.lock': File exists -- even when agents write to completely disjoint files. This is not a Claude Code bug. It is how git works. Issue #28823 documents the pattern.

3. Claude Code provides no implicit safety net. No filesystem-level locking. No copy-on-write. No transaction semantics. No warnings when you fire multiple subagents at the same working directory. The system trusts you to know what you are doing.

Five sanctioned patterns for safe parallel code generation:

Pattern How It Works When to Use
Strict file partitioning Each agent owns disjoint files; commits serialized after completion Different modules, no shared files
Worktree isolation Each agent gets its own branch/directory/index Any overlapping writes
Read-only parallel, sequential writes Parallel research with read-only tools; parent applies changes Multiple perspectives on same files
File-based coordination with post-merge Agents in separate clones; git sync forces conflict resolution Large-scale (Anthropic's 16-agent C compiler approach)
Agent Teams with shared task list Teammates in own worktrees; file-locked task claims Complex multi-agent coordination

The lesson from Anthropic's own 16-agent C compiler build is that even Anthropic does not trust its agents to share a working directory. They used separate clones with git sync. The operator should not be more trusting than its maker.


4. Orchestration Patterns

The orchestration landscape divides into three categories, and the third is eating the other two.

State-machine-driven orchestration -- what the operator currently does -- provides explicit transitions, full auditability, and testable-without-LLM determinism, but cannot adapt to unforeseen situations. LLM-driven orchestration handles unstructured problems but compounds failure rates: a 10-step process at 99% per-step reliability delivers 90.4% overall reliability. Stack ten agents and the system fails roughly once in ten runs. The hybrid pattern -- deterministic core with agentic shell -- is the dominant 2025-2026 production architecture, and the operator already implements it.

Anthropic's own research delivers the sharpest indictment of pure multi-agent orchestration: 57% of multi-agent project failures originate in orchestration design, not agent capability. The synthesis/fan-in step is where systems die -- an aggregator without explicit merge policies produces bloated or arbitrary output. The industry is learning what the operator already knows: the machine must own the rules.

Cross-framework fan-out/fan-in comparison:

System Fan-Out Fan-In Key Differentiator
Claude Code subagents Multiple Agent tool calls in same message Parent reads return strings, synthesizes Simplest; no inter-child communication
Claude Code Agent Teams Team lead spawns teammates + shared task list Lead synthesizes; peers message each other Peer-to-peer communication
LangGraph Send API to multiple nodes Reducer functions merge typed state Structured state with explicit reducers
OpenAI Agents SDK asyncio.gather() for parallel calls Manager combines outputs Handoffs + agents-as-tools
Google ADK ParallelAgent named type Synthesizer agent post-parallel Most explicit primitives
Anthropic Managed Agents Multiple stateless brains on same session Shared event log Brain/Hands/Session decoupling

The XState/Stately Agent pattern maps directly to the operator's design. The machine enforces rules while the agent handles creative work. Two bridging tools -- get_current_state and take_action -- translate between the agent and machine domains. David Fetterman's "Deterministic Core, Agentic Shell" blog post names the pattern the operator has been running. The operator is not an experiment. It is the architecture the industry converged on.


5. Token Cost Implications

Cost scaling is multiplicative, not additive. This is the fact that every "just parallelize it" proposal must confront honestly.

N parallel subagents cost N-times, plus coordination overhead. The overhead is not fixed -- it depends on which primitive you use and how you manage the prompt cache.

Scenario Relative Token Cost Wall-Clock Speed Cache Efficiency
Single agent, sequential 1.0x (baseline) Slowest Best
N subagents, parallel ~Nx + overhead Fastest Poor (parallel cold-starts)
N subagents, staggered ~Nx, better cache Moderate Good (serial cache hits)
Agent team (3 members) 3-7x Fast Independent per agent
Agent team (5 members) 5-15x Fastest Independent per agent
Sequential multi-agent harness 10-22x Slow but thorough Moderate

Prompt cache behavior is the hidden cost multiplier that nobody talks about. Parallel agents cold-start independently rather than sharing cache. Five parallel subagents each processing 100K tokens of shared system prompt pay ~3.8x more than sequential execution on that prefix alone ($3.125 vs. $0.825). Staggering launches -- waiting for the first response before firing subsequent agents -- restores cache benefits but sacrifices the wall-clock speed that was the entire point of parallelizing. This is a genuine tradeoff with no free lunch.

Model tiering is where the real savings live:

Role Recommended Model Cost vs. Opus
Orchestrator / complex reasoning Opus 4.6 1.0x
Code generation / coordination Sonnet 4.6 0.6x
Exploration / search / simple tasks Haiku 4.5 0.2x

A 40-50% cost reduction from model tiering alone. Routing exploration subagents to Haiku at $0.25/$1.25 per million tokens versus Opus at $15/$75 is not an optimization. It is the difference between sustainable and ruinous.

Practical strategies for the operator:

  • Keep spawn prompts minimal -- everything in the prompt inflates every subagent's context
  • Cap extended thinking (MAX_THINKING_TOKENS=10000) for routine tasks
  • Move stable instructions to skills (loaded on-demand vs. CLAUDE.md loaded at session start)
  • Clean up agent teams promptly -- idle teammates still consume tokens via polling
  • Use subagents over Agent Teams for focused tasks (lower overhead, higher reliability)

Recommendation Matrix

Which Primitive for Which Use Case

The operator's primitive selection is not a preference. It is a safety decision.

Use Case Primitive Isolation Model Risk Level
Parallel research / analysis (no writes) Subagents None needed Haiku Low
Parallel code generation, disjoint files Subagents File partitioning Sonnet Low-Medium
Parallel code generation, possible overlap Subagents Worktree Sonnet Medium
QA on multiple completed tasks Subagents Worktree (read from branches) Sonnet Low
Complex multi-step with inter-agent discussion Agent Teams Worktree (implicit) Mixed High (experimental)
Single-file changes from multiple perspectives Sequential subagents None Sonnet Low
Large-scale parallel (10+ agents) Manual worktrees or Docker containers Full isolation Mixed Medium-High

Primitive Comparison for Operator Design

The question is not which primitive is "best." The question is which primitive the operator can trust with its reliability contract.

Dimension Subagents Agent Teams Manual Worktrees
Readiness Production Experimental Production
Coordination Parent manages all Self-coordinating Human manages
Communication Results to parent only Peer-to-peer + task list None
Cost N-times + overhead 3-7x per 3 agents Variable
Operator integration Natural (Agent tool calls) Requires env var opt-in Manual orchestration
Reliability High Medium (known issues) High
Session resumption N/A (ephemeral) Not supported N/A
Merge burden Parent merges worktree branches Lead merges Human merges

Subagents with worktree isolation are the only combination that is both production-ready and naturally integrates with the operator's Agent tool calls. Agent Teams becomes interesting when -- and only when -- it exits experimental status and gains session resumption. Manual worktrees are the escape hatch for anything subagents cannot handle.


What Can Be Built Now vs. What Needs Experimentation

Build Now

These use production-ready primitives and well-understood patterns. No open research questions block them.

  1. Dependency graph analysis before dispatch. Parse tasks.md for independent task sets. Tasks with no shared dependencies can be fanned out in a single Agent tool response. The state machine already knows the dependency graph. Use it.

  2. Subagent fan-out for independent tasks with worktree isolation. Each parallel subagent gets isolation: worktree. The operator fires multiple Agent tool calls in one message. Results return as text summaries; the operator merges worktree branches sequentially. This is the bread-and-butter pattern.

  3. Read-only parallel research. Subagents with tools restricted to Read, Grep, Glob can safely run in parallel without isolation. No writes, no races. Use for codebase analysis, audit, or information gathering phases.

  4. Model tiering for cost optimization. Route exploration subagents to Haiku, implementation to Sonnet, keep Opus for the operator itself. This is not optional if the operator runs at any scale.

  5. Sequential commit serialization. After parallel subagents complete (in their worktrees), the operator merges branches one at a time. Git index lock contention is structurally impossible when merges are serial.

  6. Circuit breaker per parallel branch. Each parallel subagent's failure is isolated. If one fails, the operator retries that task independently without affecting completed parallel tasks. The circuit breaker pattern the operator already owns extends naturally to parallel branches.

Needs Experimentation

These are feasible but unvalidated in the operator context. Each carries an empirical question that cannot be answered by reading docs.

  1. Fan-in merge policies. When parallel tasks return, what happens if one fails and others succeed? What if results conflict? The operator needs explicit rules. "2 of 3 succeeded" is not a state the current machine represents. These policies must be designed, implemented, and stress-tested with real runs.

  2. Worktree merge conflict resolution at scale. How often do tasks the operator categorizes as "independent" actually produce merge conflicts? How well does Claude resolve them? Nobody knows. This needs empirical data from real operator runs, not assumptions.

  3. Staggered subagent launches for cache optimization. Waiting for the first subagent's response before launching subsequent ones restores cache benefits. But it sacrifices wall-clock time -- the one thing parallelism was supposed to buy. The crossover point where staggering beats full parallelism depends on task duration and prompt size, both of which vary.

  4. Agent Teams as an operator execution backend. The operator could spawn a team for complex features instead of sequential skill invocations. But Agent Teams' experimental status, session management fragility, and delegation compliance issues need stress-testing before this is viable. The operator's reliability contract is non-negotiable.

  5. Dynamic isolation selection. Can the operator detect file overlap before dispatching and choose between file partitioning (cheaper) and worktree isolation (safer)? This requires analyzing task descriptions to predict file sets -- which is asking an LLM to predict another LLM's behavior. Approach with skepticism.

  6. Parallel state machine transitions. The XState machine needs to support concurrent transitions for tasks in different states. This may require machine redesign (parallel states or multiple active state nodes). The machine currently assumes one active transition at a time. Changing this touches the operator's core invariant.

  7. Cost-bounded parallelism. Set per-agent token limits and abort parallel branches that exceed budget. The right thresholds are unknown. Too low and you get premature termination on legitimate work. Too high and you have no bound at all.


Open Questions

These are the questions that do not have answers in the documentation, the community, or Anthropic's own engineering posts. They require building, measuring, and deciding.

  • How does the XState machine represent concurrent execution? Parallel states? Multiple active nodes? External tracking of subagent handles? The current machine assumes sequential transitions. Every answer here changes the operator's core contract.

  • What is the real-world merge conflict rate for "independent" tasks? The operator categorizes tasks as independent based on dependency analysis. How often is it wrong? Is file partitioning sufficient for a typical feature build, or does worktree isolation always pay for itself? This determines whether the cheaper option is viable.

  • What does partial fan-in failure look like? Two of three parallel tasks succeed and one fails. Advance the successes and retry the failure? Roll back all three? The answer depends on whether the successful tasks' outputs are valid in isolation -- and that depends on the specific feature, not on any general rule.

  • What is the cost-optimal parallelism factor? At what N does cache miss overhead outweigh wall-clock savings? For five agents with 100K shared prefix, the cache penalty is 3.8x. For ten agents, it is worse. The crossover point depends on task duration, and nobody has published measurements.

  • Can the circuit breaker extend to parallel execution? Trip the circuit breaker for the entire parallel batch if the failure rate exceeds threshold? Or per-branch only? The operator's existing circuit breaker was designed for sequential execution. The semantics under parallelism are genuinely ambiguous.

  • Staggered or full parallel? Staggering preserves cache. Full parallel preserves speed. The right answer depends on whether you are cost-constrained or time-constrained, and that changes per run. Is this a runtime decision the operator should make dynamically, or a configuration choice?

  • What is the minimum viable task duration for worktree overhead? Worktree creation and teardown are not free. For a 30-second subagent task, worktree overhead may dominate execution time. Where is the breakpoint below which you should just use file partitioning?

  • When will Agent Teams exit experimental? Anthropic shipped Managed Agents in April 2026 as the production multi-agent offering. Will Agent Teams remain the local/power-user feature? Will Managed Agents subsume it? The answer determines whether investing in Agent Teams integration is a bet on the platform or a bet against it.


Sources

Anthropic Official Documentation

Anthropic Engineering & Research

Community & Ecosystem

External Frameworks & Patterns

GitHub Issues

How Do Agent Teams Work Internally?

Research date: 2026-04-12 Status: Complete

Overview

Agent Teams is an experimental feature shipped in Claude Code v2.1.32 (February 2026) alongside Opus 4.6. It enables one Claude Code session (the "team lead") to spawn independent teammate agents, each with its own context window and full tool access. Teammates communicate peer-to-peer through a mailbox system and coordinate through a shared task list — fundamentally different from subagents, which only report results back to a parent.

Enabled via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in environment or settings.json.


Architecture

Four components compose an agent team:

Component Role
Team Lead Main Claude Code session. Creates team, spawns teammates, coordinates work, synthesizes results. Cannot be transferred.
Teammates Separate Claude Code instances with independent context windows. Load same project context (CLAUDE.md, MCP servers, skills) but NOT the lead's conversation history.
Task List Shared work items stored at ~/.claude/tasks/{team-name}/. JSON files with dependency tracking.
Mailbox Per-agent inbox files at ~/.claude/teams/{team-name}/inboxes/{agent}.json. Async message delivery.

Team config lives at ~/.claude/teams/{team-name}/config.json — runtime state (session IDs, tmux pane IDs, member list). Auto-generated; editing by hand is overwritten.

Team Config Structure

{
  "name": "string",
  "leadAgentId": "string@team",
  "members": [
    {
      "agentId": "string@team",
      "name": "string",
      "agentType": "string",
      "backendType": "in-process|tmux|iterm2",
      "model": "haiku|sonnet|opus",
      "planModeRequired": boolean,
      "cwd": "string"
    }
  ]
}

Environment Variables Auto-Set for Teammates

  • CLAUDE_CODE_TEAM_NAME
  • CLAUDE_CODE_AGENT_ID
  • CLAUDE_CODE_AGENT_NAME
  • CLAUDE_CODE_PLAN_MODE_REQUIRED

The Seven Internal Tools

Agent Teams are built from seven tools that Claude can call:

Tool Purpose
TeamCreate (spawnTeam) Initializes team namespace, creates config.json and directory structure, establishes leader
TaskCreate Writes JSON task file to ~/.claude/tasks/{team-name}/ with description, status, owner, dependencies
TaskUpdate Claim tasks (status: "in_progress"), mark completion, assign owner, declare dependencies via addBlockedBy: [taskIds]
TaskList Returns all tasks with current status; teammates poll this to find unclaimed work
TaskGet Fetch single task details
SendMessage Peer-to-peer messaging with multiple message types (see below)
TeamDelete (cleanup) Removes team config and task files after all teammates shut down

Additional operations exist on the TeammateTool: discoverTeams, requestJoin/approveJoin/rejectJoin, requestShutdown/approveShutdown/rejectShutdown, approvePlan/rejectPlan.


Shared Task List Semantics

Task States

pending --> in_progress --> completed

How Tasks Are Created

The lead creates tasks via TaskCreate. Each task is a JSON file at ~/.claude/tasks/{team-name}/N.json with fields: ID, subject, description, status, owner, dependencies.

How Tasks Are Claimed

Two modes:

  1. Lead assigns: Lead explicitly tells a specific teammate to take a task.
  2. Self-claim: After finishing a task, a teammate calls TaskList(), finds unclaimed/unblocked tasks, and claims one via TaskUpdate (sets status: "in_progress" and owner field).

Race condition prevention: Task claiming uses file locking to prevent multiple teammates from claiming the same task simultaneously.

Dependency Tracking

Tasks can declare dependencies via addBlockedBy: [taskIds]. A pending task with unresolved dependencies cannot be claimed. When a blocking task completes, dependent tasks auto-unblock without manual intervention. Work executes in waves based on dependency chains.

Known Issue: Status Lag

Teammates sometimes fail to mark tasks as completed, which blocks dependent tasks. Workaround: manually update task status or tell the lead to nudge the teammate. (This is documented as a known limitation.)


Mailbox / Messaging System

Architecture

Each teammate has a dedicated inbox file at ~/.claude/teams/{team-name}/inboxes/{agent}.json. Messages are JSON objects written to these files. Delivery is automatic — the lead doesn't need to poll.

Message Types

Type Structure Sender Purpose
Regular message {from, text, timestamp, read} Any agent Standard communication
broadcast Same as regular, sent to all Any agent Team-wide announcements
shutdown_request {type, requestId, from, reason, timestamp} Leader Ask teammate to shut down
shutdown_approved {type, requestId, from, paneId, backendType} Teammate Confirm shutdown
idle_notification {type, from, timestamp, completedTaskId} Teammate Signal work complete
task_completed {type, from, taskId, taskSubject} Teammate Task done notification
plan_approval_request {type, from, requestId, planContent} Teammate (plan mode) Submit plan for review
permission_request {type, requestId, workerId, toolName, description, input} Teammate Bubble up permission ask
join_request {type, proposedName, requestId, capabilities} Requestor Request to join team

Broadcast vs. Direct

  • Direct (write): Send to one specific teammate by name. Cheap, targeted.
  • Broadcast: Send to ALL teammates simultaneously. Expensive — costs scale linearly with team size (one message per member context window). Use sparingly.

Ordering Guarantees

Messages are written to JSON inbox files with timestamps. No strong ordering guarantees are documented beyond timestamp-based sequencing. This is an async system, not a real-time channel.


Team Lead vs. Teammate Roles

Team Lead

  • Creates the team and spawns teammates
  • Assigns tasks (or lets teammates self-claim)
  • Approves/rejects plans when planModeRequired is set
  • Synthesizes results across all teammates
  • Initiates shutdown and cleanup
  • Fixed for team lifetime — cannot transfer leadership
  • Can enter delegate mode (Shift+Tab) to restrict itself to coordination-only tools (no code touching)

Teammates

  • Full independent Claude Code sessions with own context window
  • Load project context (CLAUDE.md, MCP servers, skills) but NOT lead's conversation history
  • Can message any other teammate by name (peer-to-peer)
  • Can self-claim tasks from the shared task list
  • Can reject shutdown requests with explanation
  • Can use subagent definitions as role templates (tools allowlist, model, instructions)
  • Cannot spawn their own teams or teammates (no nesting)
  • Team coordination tools (SendMessage, task management) always available even when subagent tools restricts other tools

Permissions

All teammates inherit the lead's permission settings at spawn time. If lead uses --dangerously-skip-permissions, all teammates do too. Can change individual modes after spawn, but not at spawn time.


Display Modes / Spawn Backends

Mode How It Works Requirements
in-process Same Node.js process, hidden panes. Navigate with Shift+Down. Any terminal. Default outside tmux.
tmux Separate tmux panes, visible simultaneously. tmux installed.
iterm2 Split panes in iTerm2. iTerm2 + it2 CLI + Python API enabled.

Auto-detection: checks $TMUX env var, then $TERM_PROGRAM for iTerm2, then falls back to in-process.

Configure globally in ~/.claude.json:

{ "teammateMode": "in-process" }

Or per-session: claude --teammate-mode in-process


Quality Gates: Hooks

Three hook events for team governance:

Hook When It Fires Exit Code 2 Effect
TeammateIdle Teammate about to go idle Send feedback, keep teammate working
TaskCreated Task being created Prevent creation with feedback
TaskCompleted Task being marked complete Prevent completion with feedback

Use cases: enforce test passing before task closure, require lint success, auto-assign follow-up tasks.

Plan Approval Workflow

  1. Teammate spawned with planModeRequired: true
  2. Teammate works in read-only plan mode — can explore but not modify
  3. Teammate submits plan_approval_request to lead
  4. Lead reviews, approves or rejects with feedback
  5. If rejected: teammate revises and resubmits
  6. If approved: teammate exits plan mode, begins implementation

Lead makes approval decisions autonomously. Influence via prompt: "only approve plans that include test coverage."


Limitations (Complete List)

Documented Limitations

  1. No session resumption: /resume and /rewind do not restore in-process teammates. Lead may try to message dead teammates. Workaround: tell lead to spawn new ones.
  2. No nested teams: Teammates cannot spawn their own teams or teammates.
  3. One team per session: Clean up current team before starting a new one.
  4. Lead is fixed: Cannot promote a teammate to lead or transfer leadership.
  5. Task status can lag: Teammates sometimes forget to mark tasks completed, blocking dependents.
  6. Shutdown can be slow: Teammates finish current request/tool call before exiting.
  7. Permissions set at spawn: All teammates start with lead's mode. No per-teammate modes at spawn time.
  8. Split panes limited: Not supported in VS Code integrated terminal, Windows Terminal, or Ghostty.
  9. No project-level team config: .claude/teams/teams.json in project dir is NOT recognized as configuration.
  10. Subagent skills and mcpServers frontmatter not applied when running as teammate.

Undocumented / Community-Discovered Issues

  1. VS Code extension: tools unavailable — TeammateTool, SendMessage, spawnTeam not available in VS Code extension even with env var set (#28048).
  2. VS Code: message delivery broken — Messages not delivered to team lead; permission prompts invisible, causing deadlock (#25254).
  3. tmux race conditionsend-keys fires before shell is ready in new pane, teammates fail to start (#40168).
  4. IPC socket hang — Lead session hangs indefinitely when unix socket peer disconnects, no timeout (#33043).
  5. CLAUDE_CONFIG_DIR not inherited — Spawned teammates don't respect config dir, fail to share task list (#23676).
  6. Shift+Up/Down auto-sends — Pre-filled suggestion text sent to wrong agent when switching panes (#26511).
  7. Bedrock model mismatch — Teammates spawned with non-Bedrock model ID via --model flag on Bedrock (#23561).
  8. Memory leak (fixed) — Completed teammate tasks never garbage collected from session state.
  9. Model overrides delegation — Claude ignores mandatory delegation triggers in CLAUDE.md, performs work inline "for efficiency" even when Agent Teams is enabled (#42856). Governance hooks (PreToolUse) don't fire in Agent tool subagents, so bypassing delegation breaks the entire governance architecture.
  10. TaskUpdate status not synced between team and session task lists (#23629).

Stability Assessment

Experimental Status

Agent Teams shipped as a research preview in v2.1.32. The feature is gated behind an environment variable and has a prominent "experimental" warning in docs. It is NOT considered production-ready by Anthropic.

What Works Well

  • Parallel research/review tasks with clear boundaries
  • Competing hypothesis debugging (adversarial pattern)
  • Independent module development where teammates own separate file sets
  • The C compiler project (16 agents, 100K lines of Rust, compiled Linux kernel) validated the core architecture at scale

What's Fragile

  • Session management: No resume, no rewind, dead teammates after disconnect
  • VS Code integration: Multiple blocking issues make it effectively broken in VS Code
  • tmux spawning: Race conditions in pane initialization
  • Task tracking reliability: Status lag is a known, unresolved issue
  • Delegation compliance: Model sometimes ignores team structure and does work inline
  • Cost control: Each teammate is a full context window; 3-agent team ~3-4x tokens; plan approval phases ~7x tokens

Changelog Signal

  • v2.1.32: Initial release (research preview)
  • v2.1.33: tmux messaging fix, TeammateIdle/TaskCompleted hooks added
  • Subsequent versions: Memory leak fix, Bedrock/Vertex/Foundry compatibility
  • v2.1.101: Permission inheritance fix, /team-onboarding command

Active bug fixes suggest ongoing investment, but the pace is incremental rather than rapid. The feature remains behind an experimental flag with no announced timeline for GA.

Roadmap Signal

  • Managed Agents (launched April 8, 2026 in public beta) is Anthropic's production-grade multi-agent offering — suggests Agent Teams may remain a power-user/local feature
  • Community requests for role-based model selection (lead=Opus, workers=Sonnet, tests=Haiku) not yet addressed
  • No announced plans for session resumption, nested teams, or leadership transfer

Cost Profile

Scenario Token Multiplier
3 teammates, 30 min ~3-4x single session
Plan approval phases ~7x standard
5-6 tasks per teammate Recommended sweet spot
16 agents (C compiler scale) $20K over 2 weeks, 2B input + 140M output tokens

Recommended team size: 3-5 teammates for most workflows. Having 5-6 tasks per teammate keeps everyone productive.


Key Architectural Insight

Agent Teams is fundamentally a file-system-based coordination protocol. Tasks are JSON files. Inboxes are JSON files. Team config is a JSON file. File locking prevents races. This makes the system inspectable, debuggable, and hackable — but also means it inherits file system limitations (no real-time guarantees, potential for stale reads, platform-dependent locking behavior).

The peer-to-peer messaging model (vs. hub-and-spoke in subagents) is the core architectural differentiator. It enables patterns like adversarial debugging, collaborative research, and self-organizing swarms that subagents cannot support.


Sources

title Subtask 2: Worktree Isolation Mechanics in Claude Code
date 2026-04-12
parent claude-code-parallel-primitives
status done

Summary

Claude Code provides first-class git worktree support for isolating parallel sessions and subagents. Worktrees create independent working directories with their own branch, index, and files while sharing the same repository history. There are three entry points: the --worktree CLI flag for user sessions, the isolation: worktree subagent frontmatter field, and the EnterWorktree tool available mid-session. Agent Teams (experimental) use worktrees implicitly for each teammate. Merge is manual by design -- Claude creates branches but the user (or a coordinating agent) decides how to integrate them. Cleanup is automatic for unchanged worktrees and prompt-driven for changed ones.


1. Worktree Creation

Entry points

Method Who uses it Behavior
claude --worktree <name> / claude -w <name> User at CLI Creates worktree, starts interactive session inside it
claude --worktree (no name) User at CLI Auto-generates random name (e.g., "bright-running-fox")
isolation: worktree frontmatter Custom subagent definition Each invocation of that subagent auto-creates a worktree
"use worktrees for your agents" Natural language prompt Claude applies worktree isolation to subagent invocations
EnterWorktree tool Mid-session (user says "work in a worktree") Creates worktree and switches session's cwd into it

Filesystem layout

Worktrees are created at:

<repo>/.claude/worktrees/<name>/

The branch is named worktree-<name> (e.g., EnterWorktree(name="invoice-pdf-export") creates branch worktree-invoice-pdf-export).

Best practice: add .claude/worktrees/ to .gitignore to prevent worktree contents from appearing as untracked files.

Base branch selection

The worktree branches from the default remote branch, which is wherever origin/HEAD points. This reference is set once during git clone and is not automatically updated if the remote's default branch changes.

To re-sync: git remote set-head origin -a To override: git remote set-head origin your-branch-name

The base branch is not configurable through any Claude Code flag or setting. For per-invocation control, use a WorktreeCreate hook (see section 5).

.worktreeinclude -- copying gitignored files

Git worktrees are fresh checkouts and do not include untracked files. To copy gitignored files (.env, .env.local, secrets) into new worktrees, create a .worktreeinclude file in the project root using .gitignore syntax:

.env
.env.local
config/secrets.json

Rules: only files matching a pattern AND also gitignored get copied. Tracked files are never duplicated. This applies to --worktree, subagent worktrees, and Desktop app parallel sessions.

Note: .worktreeinclude is not processed when a custom WorktreeCreate hook is configured -- the hook replaces default behavior entirely, so copy files inside the hook script.


2. Parallel Agent Behavior

Subagent worktrees

When a subagent has isolation: worktree, each invocation gets its own independent worktree with its own branch. Multiple subagents fired in parallel each get separate worktrees -- they are fully independent:

  • Separate branch (worktree-<auto-name>)
  • Separate working directory
  • Separate git index
  • No shared state between them

Agent A can rewrite src/auth.ts while Agent B rewrites the same file in a different worktree. There is no coordination or conflict at the filesystem level.

Subagent results returned to parent

Subagents return a text summary to the parent agent. The parent does not automatically receive the diff or commit -- it gets the subagent's final text output. The worktree (with its branch and commits) persists on disk if changes were made, but the parent must explicitly interact with the branch (merge, cherry-pick, etc.) if it wants to integrate the changes.

Agent Teams

Agent Teams (experimental, requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1) give each teammate its own context window. The documentation says teammates work independently, each in its own worktree. The team lead coordinates and can merge results sequentially.

Key differences from subagent worktrees:

  • Teammates can message each other directly (subagents cannot)
  • Teammates have a shared task list with claim/dependency semantics
  • Teammates are full Claude Code sessions, not transient subagents
  • The lead merges results, similar to how developers work on separate branches and merge through PRs

3. Merge Semantics

No automatic merge

Claude Code does not automatically merge worktree branches back to the base branch. This is by design. The merge workflow is:

  1. Subagent/teammate works in its worktree, makes commits on its worktree-<name> branch
  2. When done, the worktree persists (if changes exist)
  3. The user or coordinating agent decides how to integrate:
    • git merge
    • git cherry-pick
    • Create a PR with gh pr create
    • Ask Claude to merge in the main session

Conflict handling

Since each worktree has its own branch, conflicts surface only at merge time. There is no built-in conflict resolution -- standard git merge conflict resolution applies. Prevention strategies from the ecosystem:

  • Scope each agent's work tightly (different files/modules)
  • Merge from main frequently to keep worktree branches up to date
  • Use the team lead to coordinate sequential merging

Agent Teams merge pattern

The team lead merges results sequentially -- conceptually similar to reviewing and merging PRs one at a time. Claude is reportedly good at handling merge conflicts when given gh CLI access and PR context. But this is not automated infrastructure; it depends on the lead's prompting and tool access.


4. Cleanup Behavior

User worktrees (--worktree)

On session exit:

  • No changes: worktree and branch removed automatically
  • Changes/commits exist: Claude prompts the user to keep or remove. Keeping preserves directory and branch. Removing deletes everything, discarding uncommitted changes and commits.

User-created worktrees (--worktree) are never removed by the automatic cleanup sweep.

Subagent worktrees

  • No changes: cleaned up automatically when the subagent finishes
  • Changes exist: worktree persists on disk for manual review

Orphaned subagent worktrees

Subagent worktrees orphaned by a crash or interrupted parallel run are removed automatically at startup, subject to:

  • Older than cleanupPeriodDays setting (default: 30 days, minimum: 1)
  • No uncommitted changes
  • No untracked files
  • No unpushed commits

If any of those conditions fail, the orphaned worktree is preserved.

ExitWorktree tool

The ExitWorktree tool provides two actions:

  • "keep": leaves worktree on disk for manual inspection
  • "remove": deletes worktree with safety checks for uncommitted changes/new commits

Known bugs (as of April 2026)

  • v2.1.101 fixed: claude -w <name> failing with "already exists" after a previous session's cleanup left a stale directory
  • v2.1.92 fixed: stale subagent worktree cleanup removing worktrees that contain untracked files
  • v2.1.89 fixed: subagents with worktree isolation leaking their working directory back to the parent session's Bash tool

5. Hooks: WorktreeCreate and WorktreeRemove

WorktreeCreate

Fires when: --worktree flag or isolation: "worktree" triggers worktree creation. Effect: replaces default git worktree logic entirely.

Input schema:

{
  "session_id": "abc123",
  "hook_event_name": "WorktreeCreate",
  "worktree_path": "/path/to/worktree",
  "source_path": "/path/to/source/repo",
  "branch": "feature-branch"
}

Output requirement: must print the absolute worktree path to stdout on exit 0. Failure: any non-zero exit code aborts worktree creation (unlike other hooks where only exit code 2 blocks). No matcher support -- always fires on every worktree creation.

WorktreeRemove

Fires when: worktree is being removed (session exit or subagent finish). Effect: cannot block removal -- failures are logged in debug mode only.

Input schema includes reason field: "session_exit" or "subagent_finish". No matcher support.

Use cases for hooks

  • Non-git VCS (SVN, Perforce, Mercurial): implement custom checkout/cleanup
  • Custom base branch per invocation
  • Database isolation (creating per-worktree DB copies)
  • Audit trail logging
  • Custom dependency installation

6. Monorepo Considerations

Dependencies (node_modules)

Git worktrees are fresh checkouts -- node_modules is not present. Each worktree needs its own dependency installation unless mitigated:

Setting: worktree.symlinkDirectories

{
  "worktree": {
    "symlinkDirectories": ["node_modules", ".cache"]
  }
}

Symlinks specified directories from the main repo into each worktree. Avoids duplicating large directories.

Known bug: Claude Code's atomic write pattern (write temp -> rename) replaces symlinks with regular files when it writes to a symlinked file. A PreToolUse hook workaround exists that redirects writes to the symlink target.

Setting: worktree.sparsePaths

{
  "worktree": {
    "sparsePaths": ["packages/my-app", "shared/utils"]
  }
}

Uses git sparse-checkout (cone mode) to write only listed directories to disk. Useful when a task needs only a subset of a large monorepo.

Build caches

Build caches (.cache, .next, .turbo, etc.) are not shared by default. Options:

  • Include in symlinkDirectories (risk: cache conflicts between parallel agents)
  • Let each worktree rebuild (safe but slow)
  • Use a WorktreeCreate hook for custom cache setup

pnpm global store

pnpm's global virtual store is particularly well-suited: each worktree's node_modules contains only symlinks into a single content-addressable store. Adding a new worktree is fast and costs almost no extra disk space.

Vite/Vitest gotcha

If using Vite or Vitest, add .claude/worktrees/** to excluded paths for tests, otherwise file watchers across all worktrees will spike CPU.


7. Key Constraints and Limitations

  1. Worktree sessions are transient: cannot be resumed via claude --resume. The session and its worktree are coupled to a single lifecycle.
  2. No nested worktrees: cannot call EnterWorktree from inside an existing worktree.
  3. Subagents cannot spawn subagents: a worktree-isolated subagent cannot itself fire subagents with their own worktrees.
  4. Base branch not configurable via flag: must use git remote set-head or a WorktreeCreate hook.
  5. No auto-merge: branches must be integrated manually or via coordinating agent.
  6. symlinkDirectories write bug: atomic writes replace symlinks with regular files.
  7. Agent Teams experimental: one team per session, no session resumption, no nested teams.

8. Decision Matrix for Operator Design

Scenario Recommended approach
Parallel read-only research Subagents without worktrees (no filesystem conflicts)
Parallel code generation, different files Subagents with isolation: worktree
Parallel code generation, overlapping files Subagents with worktree + post-merge by parent
Complex multi-step with inter-agent communication Agent Teams (experimental)
Monorepo with large deps Worktree + symlinkDirectories + pnpm if possible
Non-git VCS WorktreeCreate/WorktreeRemove hooks
Custom base branch per task WorktreeCreate hook

Sources

title Can parallel subagents safely write code to the same branch?
date 2026-04-12
parent claude-code-parallel-primitives
subtask 3
status complete

Summary

Parallel subagents in Claude Code are genuinely concurrent -- they are not queued. They share the same working directory and filesystem by default. Without worktree isolation, concurrent code writes to the same branch are unsafe: file write races produce silent corruption, git index lock contention causes commit failures, and overlapping edits lead to data loss. The only sanctioned pattern for safe parallel code generation on a shared branch is strict file partitioning (each agent owns disjoint files). For anything beyond that, worktree isolation is required.

Finding 1: Subagents are truly concurrent, not queued

When Claude invokes the Agent tool (formerly Task tool) multiple times in a single response, those subagents fire concurrently. Wall-clock execution time equals the slowest subagent, not the sum.

Key evidence:

  • Official SDK docs state: "Multiple subagents can run concurrently, dramatically speeding up complex workflows" (Subagents in the SDK)
  • Technical analysis confirms: "Three subagents that each take thirty seconds finish in thirty seconds. Not ninety." (Medium: How the Task Tool Actually Distributes Work)
  • Each subagent is a separate API call running in its own context window. The parent agent does not wait for one to finish before starting the next.

The concurrency is at the process/API-call level. Claude decides when to parallelize based on task independence -- the developer defines the capability via agent definitions, not the scheduling.

Finding 2: Parallel subagents share the same working directory by default

Without isolation: worktree, all subagents operate on the same filesystem checkout as the parent session. There is no implicit sandboxing.

Key evidence:

  • Official docs: "A subagent starts in the main conversation's current working directory" (Create custom subagents)
  • The cd command does not persist between Bash calls within a subagent, but the base working directory is shared across all concurrent subagents.
  • Technical analysis: "Subagents inherit the parent session's tool permissions, including filesystem access."

Implication: two subagents spawned in the same turn can read and write the same files simultaneously. There is no filesystem-level locking, no copy-on-write, and no transaction semantics.

Finding 3: Concurrent writes to the same file produce silent corruption

This is the critical safety finding. When multiple subagents write to the same file concurrently, the result is nondeterministic garbage. There are no warnings, no errors, no conflict markers.

Key evidence:

The failure mode is last-write-wins at the OS level. The Edit tool does string replacement, so if Agent A's edit changes line counts before Agent B's edit executes, Agent B's old_string match may fail or match the wrong location.

Finding 4: Git index lock contention causes commit failures

Git uses .git/index.lock as a mutex. When two processes attempt to stage or commit simultaneously, the second process fails with fatal: Unable to create '.git/index.lock': File exists.

Key evidence:

  • GitHub issue #28823: Race condition with git index.lock during lint-staged pre-commit failures. Claude Code sees a failure, immediately retries, and hits the lock file.
  • Git's design: only one process can hold the index lock at a time. Concurrent git add or git commit calls will fail.
  • From lint-staged experience: "When chunked files are split up into enough groups the chance of a race condition on calling git add increases."

Even if two subagents write to completely disjoint files, they cannot safely run git add and git commit at the same time on the same checkout. The git index is a single shared resource.

Finding 5: Known patterns for safe parallel code generation

Pattern 1: Strict file partitioning (works without worktrees)

Each subagent is assigned exclusive ownership of specific files/directories. No two agents touch the same file. Commits are serialized after all agents complete.

  • "Parallel dispatch requires 3+ unrelated tasks with no shared state between tasks and clear file boundaries with no overlap."
  • Domain-based example: Frontend agent owns src/components/**, Backend agent owns src/api/**, Database agent owns src/db/**.
  • Limitation: agents cannot touch shared configuration files, routing tables, barrel exports, or any other "hotspot" files.

Pattern 2: Worktree isolation (recommended for code writes)

Set isolation: worktree in the subagent frontmatter. Each subagent gets its own branch, working directory, and git index.

  • "With worktree isolation, each agent has the entire codebase to itself." (Claude Code Worktrees)
  • Git worktrees share the .git object store but have independent indexes and HEADs.
  • Constraint: two worktrees cannot check out the same branch simultaneously.
  • Cleanup: worktrees with no changes are auto-removed when the subagent finishes.

Pattern 3: Read-only parallel, sequential writes

Use parallel subagents for research/analysis (read-only tools), then have the parent agent apply changes sequentially based on subagent findings.

  • This is the safest pattern when multiple agents need to inform changes to the same files.
  • Subagent tools field is restricted to Read, Grep, Glob -- no Edit, Write, or Bash.

Pattern 4: File-based coordination with post-merge (Anthropic's C compiler approach)

From Anthropic's Building a C Compiler project:

  • 16 agents ran in parallel, each in its own Docker container with a cloned repo.
  • File-based locking: agents wrote text files to current_tasks/ to claim work.
  • Git's built-in synchronization forced conflict resolution: agents pull upstream, merge, then push.
  • "Merge conflicts are frequent, but Claude is smart enough to figure that out."
  • This approach explicitly accepts and resolves conflicts rather than preventing them.
  • Critical limitation: when all agents hit the same bug (e.g., compiling Linux kernel), parallelization collapses because every agent overwrites the same fix.

Pattern 5: Agent Teams with shared task list

For the most complex coordination needs:

  • Each teammate gets its own context window and (optionally) its own worktree.
  • Shared task list with file-based locking prevents duplicate claims.
  • Teammates can message each other directly.
  • Sequential task claiming prevents race conditions on the task list itself.

Finding 6: What the operator should do

Given these findings, for the operator agent's parallel execution design:

Scenario Recommended approach
Parallel research/analysis (no writes) Subagents with read-only tools, no isolation needed
Parallel code generation to disjoint files Subagents with file partitioning, sequential commits after completion
Parallel code generation with possible overlap isolation: worktree on each subagent, merge after completion
Complex multi-agent coordination Agent Teams (experimental)
Single-file changes from multiple perspectives Sequential subagents, not parallel

The key insight: parallelism is safe for reads, dangerous for writes, and requires explicit isolation for overlapping writes. Claude Code provides no implicit safety net -- the orchestrator (operator) must enforce the boundaries.

Open questions

  • How does worktree merge actually work when the parent agent receives results? Is it automatic or does the parent need to git merge manually?
  • What is the performance cost of worktree creation/teardown for short-lived subagents?
  • Can the operator detect file overlap before dispatching and dynamically choose isolation vs. partitioning?

Sources

What Supervisor/Orchestration Patterns Exist for Multi-Agent Coordination?

Research date: 2026-04-12 Status: Complete

Overview

Multi-agent orchestration is the central design problem in agentic AI. Anthropic's own analysis of 200+ enterprise deployments found that 57% of project failures originated in orchestration design — agents were individually capable but poorly coordinated. This document maps the landscape of supervisor/coordination patterns across Claude Code's native primitives, Anthropic's published guidance, competing frameworks, and the broader industry, with specific attention to how these patterns inform the operator agent's parallel execution design.


1. Claude Code Native Orchestration Primitives

Claude Code offers three distinct mechanisms for multi-agent coordination, each at a different abstraction level.

1a. Subagents (Agent Tool)

The simplest delegation primitive. A parent agent spawns a child via the Agent tool; the child runs in its own context window and returns a summary to the parent.

Architecture:

  • Parent fires Agent tool call with a prompt, optional model/tools/isolation config
  • Child gets its own context window with a custom system prompt
  • Child works independently, returns results as a string summary
  • Parent never sees the child's internal reasoning, only the compressed output
  • Children cannot spawn their own subagents (no nesting)

Parallel execution: Multiple Agent tool calls in a single message fire concurrently. This is the key primitive — Claude can invoke 3-5 subagents simultaneously if the tool calls appear in the same response.

Configuration surface (frontmatter fields):

  • tools / disallowedTools — restrict capabilities
  • model — route to cheaper/faster models (Haiku for exploration, Sonnet for implementation)
  • isolation: worktree — git worktree isolation for filesystem safety
  • maxTurns — bound execution length
  • permissionMode — control approval behavior
  • background: true — run without blocking the parent
  • skills — inject domain knowledge

Fan-in semantics: Results return as strings to the parent's context. The parent must synthesize/aggregate. No structured merge — it's free-form text compression.

Key limitation: No inter-child communication. All coordination must flow through the parent. This creates a bottleneck when workers need to share intermediate discoveries.

1b. Agent Teams (Experimental)

A higher-level coordination primitive shipped February 2026. One session acts as team lead; teammates are independent Claude Code instances that communicate peer-to-peer.

Key differences from subagents:

Dimension Subagents Agent Teams
Communication Report to parent only Peer-to-peer messaging + shared task list
Coordination Parent manages all Self-coordinating via task claims
Context Own window; results compressed back Own window; fully independent
Lifecycle Ephemeral per invocation Persistent for team duration
Cost Lower (compressed returns) Higher (each is a full Claude instance)
Best for Focused tasks where only result matters Complex work requiring discussion

Coordination primitives:

  • Shared task list with pending/in-progress/completed states and dependency tracking
  • File-locking for task claims (prevents race conditions)
  • Mailbox system for direct and broadcast messaging
  • Plan approval gates — teammates can be required to plan before implementing
  • HooksTeammateIdle, TaskCreated, TaskCompleted for programmatic quality gates
  • Auto-unblocking — completing a dependency auto-unblocks downstream tasks

Current limitations:

  • Experimental (env var opt-in)
  • No session resumption with in-process teammates
  • No nested teams
  • One team per session
  • Lead cannot be transferred
  • Task status can lag (teammates sometimes fail to mark completion)

1c. Manual Parallel Sessions (Git Worktrees)

The lowest-level primitive: multiple independent Claude Code sessions running in separate git worktrees. No automated coordination — the human orchestrates.

When this is appropriate: Maximum isolation, maximum human control, no inter-agent coordination needed.

Comparison Matrix

Pattern Parallelism Communication Coordination Token Cost Human Effort
Subagents Yes (same message) Parent only Parent manages Low-Medium Low
Agent Teams Yes (persistent) Peer-to-peer Self-coordinating High Medium
Manual Worktrees Yes (manual) None Human Variable High

2. Anthropic's Published Orchestration Patterns

2a. "Building Effective Agents" — The Canonical Patterns

Anthropic's foundational guide identifies five composable patterns, plus a meta-recommendation about simplicity.

Orchestrator-Workers Pattern: A central LLM dynamically breaks down tasks, delegates to worker LLMs, and synthesizes results. The key distinction from parallelization is flexibility — subtasks aren't pre-defined but determined by the orchestrator based on the specific input.

  • Best for: coding changes across multiple files, research tasks with uncertain scope
  • The orchestrator decides what to parallelize at runtime
  • Workers report back; orchestrator synthesizes

Parallelization Pattern (two variations):

  1. Sectioning — break work into independent parallel subtasks (e.g., guardrails checking + query processing simultaneously)
  2. Voting — run identical tasks multiple times for diversity/confidence (e.g., code vulnerability reviews from multiple prompts)

Routing Pattern: Classify inputs and route to specialized handlers. Prevents optimization for one category from degrading others. Can route to different models (Haiku for simple, Sonnet for complex).

Meta-recommendation: "Start with single LLM calls. Add complexity only when measurable performance gains justify the cost increase." Multi-agent adds latency and cost — validate the tradeoff.

2b. Multi-Agent Research System

Anthropic's internal research system uses an orchestrator-worker pattern with these specifics:

Architecture:

  • Lead agent analyzes queries, develops strategies, spawns 3-5 subagents in parallel
  • Subagents use 3+ tools in parallel (reducing research time by up to 90% for complex queries)
  • A dedicated CitationAgent processes documents for source attribution

Task decomposition rules:

  • Simple queries: 1 agent, 3-10 tool calls
  • Comparisons: 2-4 subagents, 10-15 calls each
  • Complex research: 10+ subagents with clearly divided responsibilities

Critical finding: Lead agents currently execute subagents synchronously — waiting for each batch to complete before proceeding. This simplifies coordination but creates bottlenecks.

Performance: Opus 4 leading + Sonnet 4 subagents outperformed single-agent Opus 4 by 90.2% on internal research evaluation.

Cost reality: Multi-agent systems use ~15x more tokens than chat interactions. Token usage alone explains 80% of performance variance.

2c. Harness Design for Long-Running Apps

Anthropic's three-agent harness (Planner -> Generator -> Evaluator) reveals key orchestration principles:

Architecture:

  • Planner: Transforms brief prompts into comprehensive specs (10-16 features)
  • Generator: Implements features incrementally with version control
  • Evaluator: QA via Playwright MCP, testing like a user would

Key findings:

  • 20x cost premium: Full harness costs ~$200 vs $9 for solo agent — but produces dramatically better output
  • Self-evaluation fails: "Out of the box, Claude is a poor QA agent." Agents confidently praise their own mediocre work. Separating generator from evaluator is essential.
  • GAN-inspired pattern: Generator-evaluator creates genuine feedback loops. The evaluator required careful calibration toward appropriate skepticism.
  • File-based handoffs: Agents communicate through structured files, not conversation. One writes; the next reads.
  • Sprint contracts: Generator and evaluator negotiate explicit deliverables and success criteria before implementation begins.

Evolution insight: "Every component in a harness encodes an assumption about what the model can't do on its own, and those assumptions are worth stress testing." Better models don't eliminate scaffolding — they shift where complexity lives.

2d. Managed Agents — Brain/Hands/Session

The newest Anthropic architecture (2026) virtualizes the agent into three decoupled components:

  • Brain (Harness): Stateless reasoning service. execute(name, input) -> string interface.
  • Hands (Execution): Sandboxes, containers, tools — all interchangeable through standardized interfaces.
  • Session (State): Append-only event log. Lives outside the harness. Enables rewind, failover, parallel reasoning instances.

Multi-agent implications:

  • Multiple stateless brains can connect to the same session via wake(sessionId)
  • Brains can "pass hands to one another" through tool interfaces
  • Agents coordinate through the shared session event log (implicit message bus)
  • p50 TTFT dropped ~60%, p95 dropped >90% from decoupling brain from hands

This architecture is the meta-framework: unopinionated about specific harness implementations but strict about stable interfaces.


3. External Framework Patterns

3a. LangGraph — Graph-Based State Machine Orchestration

LangGraph treats agents as nodes in a directed graph with typed state flowing through edges.

Supervisor pattern:

  • Central supervisor agent coordinates multiple specialized agents
  • Controls all communication flow and task delegation
  • Decides which agent to invoke based on current context
  • Maintains shared, persistent state across the workflow
  • Built-in checkpointing with time travel (replay/rewind)

Fan-out/fan-in:

  • Send API for dynamic fan-out to multiple nodes
  • State merging with reducer functions for fan-in
  • Async nodes for parallel execution
  • Claimed 60% latency reduction using parallel patterns

State management: Explicit, reducer-driven state schemas using TypedDict and Annotated types. Reducer functions prevent data loss during concurrent updates. This is the key differentiator — structured state over free-form text.

Production characteristics: Maximum control, compliance-ready, production-grade state management. Higher learning curve.

3b. CrewAI — Role-Based Crew Orchestration

CrewAI uses a role metaphor: agents have roles, backstories, and goals. Crews execute tasks through processes.

Process types:

  • sequential — tasks execute in order
  • hierarchical — manager agent delegates to workers
  • consensual — agents negotiate (experimental)

Characteristics: Rapid prototyping, intuitive abstraction. But: consumed nearly 2x tokens and 3x+ time in benchmarks due to multi-step verification overhead.

3c. OpenAI Agents SDK — Handoffs and Agents-as-Tools

Successor to Swarm (March 2025). Two core orchestration primitives:

Agents as Tools (Manager Pattern):

  • Manager agent maintains control, invokes specialists via Agent.as_tool()
  • Specialist helps with bounded subtask but doesn't take over conversation
  • Manager combines outputs, enforces unified guardrails

Handoffs (Delegation Pattern):

  • Triage agent routes to specialist who becomes the active agent
  • Specialist owns the next part of the interaction
  • Context history collapsed into single message during handoff (v0.6.0 breaking change)

Hybrid: A triage agent hands off to a specialist, and that specialist can still call other agents as tools for narrow subtasks.

Swarm vs. Supervisor:

  • Supervisor: Central coordinator routes all tasks based on runtime state
  • Swarm: Each agent encapsulates its own routing logic, decides independently when to transfer control
  • Swarm distributes routing intelligence; supervisor concentrates it

3d. Google ADK — Workflow Agent Primitives

Google's Agent Development Kit (April 2025) provides explicit workflow agent types:

  • SequentialAgent: Executes sub-agents in predefined order (Parser -> Extractor -> Summarizer)
  • ParallelAgent: Runs sub-agents simultaneously (Security Auditor + Style Enforcer + Performance Analyst, then Synthesizer merges)
  • LoopAgent: Iterative execution with exit signals (escalate=True for early completion, max_iterations as hard limit)

These are the most explicit fan-out/fan-in primitives in any framework — named types rather than patterns you compose.

3e. XState/Stately Agent — Deterministic Core, Agentic Shell

The pattern from David Fetterman (February 2026) applies Gary Bernhardt's "Functional Core, Imperative Shell" to agents:

  • Deterministic Core: XState machines handle workflow logic, state transitions, business rules
  • Agentic Shell: LLM agents manage conversation flow and natural language interpretation
  • Tools as membrane: Translate agent intent to machine events

Key insight: The machine enforces rules while the agent handles creative work. Two critical tools: get_current_state (query machine) and take_action (send typed events). Dynamic tool swapping based on machine state — agent capabilities change as the workflow progresses.

Production architecture:

Non-deterministic (thin) -> Deterministic (thick) -> Storage
Agent interprets        -> Machine validates    -> Postgres persists

4. Pattern Taxonomy: State Machine vs. LLM-Driven Orchestration

The fundamental axis in orchestration design is who decides what happens next.

State Machine Driven

The orchestrator is a state machine executing predetermined steps. LLM is called only at bounded decision points.

Characteristics:

  • Explicit state transitions, fully auditable
  • Deterministic routing (same state + event = same transition)
  • Testable without LLM involvement
  • Failure modes are enumerable
  • Cannot adapt to unforeseen situations

Best for: Compliance-critical workflows, multi-step sequences requiring audit trails, known-shape processes.

LLM Driven

The agent decides when to call what tool, plans ahead, reflects on results, backtracks or revises strategy.

Characteristics:

  • Handles unstructured problems with uncertain paths
  • Can adapt to novel situations
  • Non-deterministic — 1% failure compounds (10-step process = ~90.4% success)
  • Harder to audit and debug
  • More expensive (reasoning overhead)

Best for: Open-ended tasks, creative work, research with unknown scope.

Hybrid (The Production Pattern)

The dominant 2025-2026 pattern: rigid orchestrator containing intelligent routing agents.

  • State machine handles workflow shape (what phases exist, what order)
  • LLM handles decisions within phases (how to implement, what to search for)
  • Tools bridge the two (machine validates agent intent)

This is exactly what the operator already does. The operator is a stateless loop reading XState machines via CLI. Each state is a skill name. The LLM decides how to execute the skill, but the machine decides what skill runs next. Circuit breaker is a machine guard, not LLM judgment (except for cross-task trips).


5. Fan-Out / Fan-In Patterns Across Systems

The "fan-out work, fan-in results" pattern appears in every framework but with different semantics:

System Fan-Out Mechanism Fan-In Mechanism State During Parallel
Claude Code Subagents Multiple Agent tool calls in same message Parent reads all return strings, synthesizes No shared state; parent waits
Claude Code Agent Teams Team lead spawns teammates + shared task list Lead synthesizes; teammates can message each other Shared task list + mailbox
LangGraph Send API to multiple nodes Reducer functions merge typed state Explicit state schema with reducers
OpenAI Agents SDK asyncio.gather() for parallel agent calls Manager combines outputs Conversation history handoff
Google ADK ParallelAgent wrapper Synthesizer agent post-parallel ADK session state
Anthropic Harness Sequential (Planner -> Generator -> Evaluator) File-based handoffs Files on disk
Anthropic Managed Agents Multiple stateless brains on same session Shared event log Append-only session events

Key finding: The synthesis step is where most fan-in systems fail. An aggregator told to "combine everything" without quality criteria, priority rankings, or conflict resolution rules produces bloated or arbitrary output. Effective fan-in requires explicit merge policies.


6. The Operator's Current Pattern and Extension Points

The existing operator (agents/operator.md) implements a state-machine-driven sequential orchestrator:

Read State -> Pick Next Task -> Invoke Skill (Agent tool) -> Read Result -> Advance Machine -> Loop

What it already does well:

  • Clean separation: XState machines own state, operator owns coordination, skills own execution
  • Stateless loop — all state in filesystem, resumable by design
  • Circuit breaker pattern with both automatic (retry exhaustion) and judgment (blocked/WIP) trips
  • File-based handoffs (progress.md, operator.md) for inter-agent communication
  • GAN-inspired generator-evaluator separation (do-work -> qa)

Where parallelism could enter:

  1. Independent tasks in parallel: When tasks have no dependencies, fire multiple Agent tool calls in one message. This uses the subagent primitive directly. The state ledger would need to support concurrent transitions.

  2. Parallel QA: Run QA on multiple completed tasks simultaneously. Each QA agent gets its own worktree isolation.

  3. Agent Teams for complex features: For features with many interrelated tasks, the operator could spawn a team instead of sequential skill invocations. The team lead would be the operator itself, with teammates owning individual tasks.

  4. Hybrid: parallel fan-out for independent tasks, sequential for dependent chains. Parse the dependency graph from tasks.md, identify independent sets, fan-out each set, fan-in results, advance all machines, repeat.


7. Key Recommendations

For the operator's parallel execution design:

  1. Start with subagent fan-out for independent tasks. This is the lowest-risk addition: fire multiple Agent tool calls when the dependency graph says tasks are independent. No new primitives needed — just analyze the task DAG before picking the next task.

  2. Use worktree isolation for parallel code generation. Each parallel subagent should get isolation: worktree to prevent filesystem conflicts. Merge happens after all return.

  3. Don't use Agent Teams for the operator — yet. Agent Teams adds coordination overhead and experimental instability. The operator's value is deterministic execution. Agent Teams is better for exploratory/research phases where inter-agent discussion adds value.

  4. Keep the state machine as the authority. The hybrid pattern (deterministic core, agentic shell) is exactly right. The machine decides task ordering and transitions; the LLM decides how to execute each task. Parallelism is a scheduling optimization, not an architectural change.

  5. Design explicit fan-in policies. When parallel tasks return, the operator needs merge rules: are results independent (just advance each machine separately)? Do they need synthesis? What happens if one fails and others succeed?

  6. Budget for 15-20x token cost. Anthropic's own data: multi-agent = ~15x chat tokens. Parallel fan-out multiplies this by the parallelism factor. Set per-agent token limits and overall budget caps.


Sources

topic What are the token cost and context implications of parallel vs. sequential execution?
parent [[claude-code-parallel-primitives/research]]
date 2026-04-12
type research-subtask
status complete

Token Cost and Context Implications of Parallel vs. Sequential Execution

Parallel execution in [[Claude Code]] fundamentally changes the cost equation compared to sequential execution. The core tension: parallel agents trade token efficiency for wall-clock speed. Each parallel agent operates as a separate Claude API session with its own [[context window]], so costs scale with concurrency rather than being amortized into a single growing context.

Cost Model: Multiplicative, Not Additive

Parallel execution produces roughly N-times token consumption for N agents, plus coordination overhead. The scaling is not purely multiplicative because different primitives have different overhead profiles:

[[Subagents]] (Agent tool): Each subagent runs in its own fresh context window. The parent agent's conversation history does not carry over -- only the Agent tool's prompt string provides context. The subagent's final message returns verbatim as the tool result. Token cost = (system prompt + CLAUDE.md + tool definitions + spawn prompt + subagent's own tool calls and reasoning) per subagent. For N parallel subagents, total input tokens are roughly N times the per-subagent cost, not N times the parent's accumulated context.

[[Agent Teams]]: Each teammate is a fully independent [[Claude Code]] instance with its own 1M-token context window. Official documentation states agent teams use approximately 7x more tokens than standard sessions when teammates run in plan mode, though this number varies with team size. A 4-agent team uses 3-4x tokens of a single session; a 5-agent team can reach 5-7x. The multiplier exceeds the raw agent count because of coordination overhead: mailbox messages, task list polling, and broadcast communications all consume additional tokens.

Sequential execution has a different cost profile: a single growing context where each subsequent API call re-sends the entire conversation history. The Nth sequential call costs proportionally more because the context has grown. However, [[prompt caching]] dramatically reduces the effective cost of re-sent content (cache reads cost 0.1x base input price).

Prompt Cache Behavior: Parallel Agents Do Not Share Cache

This is one of the most important findings for cost optimization. [[Prompt caching]] in Claude has several properties that interact poorly with parallel execution:

  1. Cache isolation: Caches are isolated per workspace (as of February 2026). Different organizations never share caches. Within a workspace, cache hits require 100% identical prompt prefixes -- if any byte changes, the cache misses entirely.

  2. Parallel cold-start problem: A cache entry only becomes available after the first response begins streaming. For concurrent requests, Anthropic's documentation explicitly recommends waiting for the first response before sending subsequent requests to ensure cache hits. Parallel subagents firing simultaneously will each cold-start their own cache writes rather than benefiting from a shared cache.

  3. Per-model cache separation: Caches are per-model. If the parent runs on [[Opus]] and subagents run on [[Sonnet]] or [[Haiku]], each model builds its own cache from scratch. Switching a subagent to Haiku for cost savings means Haiku cannot reuse Opus's cached context.

  4. What does get cached across parallel agents: Tool definitions and system prompts that are identical across subagents will share cache -- but only if the requests are serialized (first response completes before second request fires). In practice, truly parallel subagents each pay the 1.25x cache write cost independently.

Quantitative impact: Cache reads cost $0.50/MTok on Opus 4.6 vs. $5/MTok for uncached input (10x cheaper). Cache writes cost $6.25/MTok (1.25x base). For a parallel fan-out of 5 subagents each processing 100K tokens of shared system prompt, sequential execution would pay 1 cache write + 4 cache reads ($0.825 total), while parallel execution pays 5 independent cache writes ($3.125 total) -- roughly 3.8x more expensive on that shared prefix alone.

Context Window Pressure and Subagent Result Compression

When subagent results return to the parent, context pressure depends on the execution primitive:

Subagents (Agent tool): The parent receives the subagent's final message verbatim as the Agent tool result. All intermediate tool calls, file reads, and reasoning stay in the subagent's context and never enter the parent. This is the primary context-saving mechanism -- a subagent might read 15 files (150K tokens internally) but return only a 2K-token summary. The parent's context grows only by the size of the returned result, not the subagent's working context.

Agent Teams: Results flow through the mailbox messaging system, not as direct context injection. Each teammate is fully independent -- the parent (team lead) receives messages from teammates but does not inherit their full context. The trade-off is that Agent Teams cannot return results as compactly as subagents because the communication is message-based rather than tool-result-based.

Compaction: [[Auto-compaction]] triggers when context usage hits approximately 83.5% of the window (the buffer is ~33K tokens on a 200K window). Compaction produces a summary typically 60-80% shorter than the original. In multi-agent scenarios, compaction happens independently per agent -- each agent compacts its own context without awareness of other agents' state. The API-level compaction (beta: compact-2026-01-12) allows configuring context_token_threshold with a default of 150K tokens and minimum of 50K.

The 20x Cost Finding: Does Parallelism Multiply It?

Anthropic's [[harness design]] article documented a dramatic cost differential: a solo agent run cost $9 over 20 minutes, while the full three-agent harness (planner + generator + evaluator) cost $200 over 6 hours -- roughly 22x more expensive. Updated results with [[Opus 4.6]] brought this down to $124.70 over 3 hours 50 minutes, with the breakdown:

Agent Phase Cost Time
Planner $0.46 4.7 min
Build Round 1 $71.08 2h 7m
QA Round 1 $3.24 8.8m
Build Round 2 $36.89 1h 2m
QA Round 2 $3.09 6.8m
Build Round 3 $5.88 10.9m
QA Round 3 $4.06 9.6m

The 20x cost is not primarily from parallelism -- the harness runs agents sequentially (build then QA, iteratively). The cost comes from longer autonomous sessions with more tool calls, more code generation, and more evaluation cycles. The harness trades cost for quality: the evaluator catches concrete bugs (routing errors, untriggered functions) that a solo agent would miss.

Does adding parallelism to the harness pattern multiply the 20x? Not linearly. If you parallelized the build phase across 3 agents working on different features simultaneously, you would roughly triple the build-phase cost but potentially cut wall-clock time by 2-3x. The evaluator phase would remain sequential (it needs to see the combined output). So a parallelized harness might cost 1.5-2x more than the sequential harness, not 3x, because the QA and planning phases stay constant. The key insight from the article: with Opus 4.6, context resets between rounds were eliminated entirely -- "the agents were run as one continuous session across the whole build, with the Claude Agent SDK's automatic compaction handling context growth along the way."

Practical Cost Optimization Strategies

Model tiering is the highest-leverage optimization for multi-agent workflows:

Role Recommended Model Cost Ratio vs. Opus
Orchestrator / complex reasoning Opus 4.6 1.0x
Code generation / coordination Sonnet 4.6 0.6x input, 0.6x output
Exploration / search / simple tasks Haiku 4.5 0.2x input, 0.2x output

The built-in Explore subagent already uses [[Haiku]] for read-only codebase search. Using Haiku for exploration subagents and Sonnet for implementation typically reduces costs 40-50% compared to using Sonnet for everything.

Context isolation is the second-highest leverage. Subagents that perform verbose operations (running tests, reading logs, fetching docs) should handle that content in their own context and return only summaries. The official docs explicitly recommend this pattern: "delegate verbose operations to subagents so the verbose output stays in the subagent's context while only a summary returns to your main conversation."

Specific strategies:

  1. Keep spawn prompts minimal: Everything in the spawn prompt adds to each subagent's context from the start. Include only file paths, error messages, and decisions the subagent needs.

  2. Cap extended thinking: Set MAX_THINKING_TOKENS=10000 for routine subagent tasks. Thinking tokens are billed as output tokens ($25/MTok on Opus), so an uncapped thinking budget on N parallel agents multiplies this cost N-fold.

  3. Prefer subagents over agent teams for focused tasks. Subagents return results directly to the parent with lower overhead. Agent teams add mailbox, task list, and coordination protocol overhead -- use them only when agents need to communicate with each other.

  4. Serialize cache-dependent requests when possible. If multiple subagents share a large system prompt, staggering their launch (waiting for the first to begin streaming) allows subsequent agents to hit the cache.

  5. Use /compact while cache is warm: Deploy compaction within 5 minutes of the last message while the prompt cache is still valid, rather than after the cache expires.

  6. Move stable instructions to skills: Skills load on-demand, while CLAUDE.md loads at session start for every agent. Specialized instructions in skills avoid inflating every subagent's base context.

  7. Clean up agent teams promptly: Active teammates continue consuming tokens even when idle, as they still process mailbox polls and task list checks.

Cost Comparison Summary

Scenario Relative Token Cost Wall-Clock Speed Cache Efficiency
Single agent, sequential 1.0x (baseline) Slowest Best (full cache reuse)
N subagents, parallel ~Nx + subagent overhead Fastest Poor (parallel cold-starts)
N subagents, staggered ~Nx but better cache Moderate Good (serial cache hits)
Agent team (3 members) 3-7x Fast Independent per agent
Agent team (5 members) 5-15x Fastest Independent per agent
Harness (sequential multi-agent) 10-22x Slow but thorough Moderate (context resets)

The critical insight: parallelism primarily trades dollars for time, and the cost-time tradeoff is rarely linear because coordination overhead, cache misses, and independent context windows all add friction. The most cost-effective parallel pattern is focused subagents with model tiering -- Haiku for exploration, Sonnet for implementation, Opus only for orchestration.

Sources

  1. Manage costs effectively - Claude Code Docs -- Official cost management guide with agent team token costs, rate limit recommendations, and context management strategies
  2. Harness design for long-running application development - Anthropic Engineering -- Source of the 20x cost finding; detailed breakdown of multi-agent harness costs with Opus 4.5 and 4.6
  3. Prompt caching - Claude API Docs -- Complete prompt caching documentation including pricing multipliers, parallel request behavior, cache invalidation rules, and TTL semantics
  4. Create custom subagents - Claude Code Docs -- Subagent architecture, context isolation mechanics, built-in Explore/Plan agents, model selection for subagents
  5. Subagents in the SDK - Claude Code Docs -- SDK-level subagent documentation covering context isolation, parallelization, what subagents inherit vs. what they don't
  6. Orchestrate teams of Claude Code sessions - Claude Code Docs -- Agent Teams architecture, token usage scaling, communication overhead, comparison with subagents
  7. Pricing - Claude API Docs -- Complete model pricing for Opus 4.6, Sonnet 4.6, Haiku 4.5 including cache write/read rates and batch discounts
  8. Compaction - Claude API Docs -- API-level compaction controls, trigger thresholds, integration with prompt caching, token counting behavior
  9. Claude Code Sub Agents - Burn Out Your Tokens - DEV Community -- Real-world token consumption data showing parallel subagents exhausting Pro plan quota in 15 minutes vs. 30 minutes sequential
  10. Claude Code Cost Optimisation Guide - systemprompt.io -- Quantified optimization strategies: model selection saves ~40%, MAX_THINKING_TOKENS is the single biggest lever, typical daily costs $5-15 vs. $20-40 without optimization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment