usirin/research.md

Last active April 12, 2026 21:04

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/usirin/fcc883e192c1b065ffa3f1eecc9cb226.js"></script>
Save usirin/fcc883e192c1b065ffa3f1eecc9cb226 to your computer and use it in GitHub Desktop.

Research: Claude Code Parallel Execution Primitives for Operator Agent Enhancement

Raw

title	Parallel Execution Primitives for Claude Code Operator Agents
date	2026-04-12
project	kampus
feature	claude-code-parallel-primitives
type	research
status	complete

Parallel Execution Primitives for Claude Code Operator Agents

Executive Summary

Claude Code has exactly three primitives for parallel execution -- subagents, git worktrees, and Agent Teams -- and the most important thing about all three is what they do not provide: any implicit safety net for concurrent writes. Parallel subagents sharing a working directory produce silent file corruption. No warnings. No conflict markers. No errors. Last write wins at the OS level, and the Edit tool's string replacement fails unpredictably when line counts shift between agents. The git index lock (fatal: Unable to create '.git/index.lock': File exists) fires even when agents write to completely disjoint files. Claude Code will not protect you. The operator must enforce isolation boundaries, or the operator ships corruption.

This is the central finding and it has a direct corollary: the only sanctioned patterns for parallel code generation are strict file partitioning (disjoint file ownership per agent) or worktree isolation (each agent gets its own branch and working directory). Read-only parallelism is safe without either. Everything else is a race condition waiting to surface.

For the operator specifically, the lowest-risk path is subagent fan-out with worktree isolation for independent tasks, where the state machine's dependency graph determines what can run concurrently. This preserves every guarantee the operator already makes -- circuit breaker, retry logic, state machine authority -- while cutting wall-clock time proportional to the parallelism factor. Agent Teams, the peer-to-peer coordination layer shipped in February 2026 behind an experimental flag, is architecturally interesting and operationally fragile: no session resumption, task status synchronization bugs, delegation compliance failures, and 20 documented issues across 10 official and 10 community-discovered reports. Anthropic's launch of Managed Agents (April 2026, public beta) as the production-grade multi-agent offering suggests Agent Teams may remain a power-user local feature indefinitely.

Token costs scale N-times for N parallel agents, compounded by prompt cache misses -- parallel agents cold-start independently rather than sharing cache. Five parallel subagents each processing 100K tokens of shared system prompt pay $3.125 versus $0.825 for sequential execution on that prefix alone. That is a 3.8x penalty on cache alone, before counting anything else. Model tiering -- Haiku for exploration, Sonnet for implementation, Opus only for orchestration -- is the highest-leverage cost optimization, yielding 40-50% savings. The 20x cost premium documented in Anthropic's harness design research comes from longer autonomous sessions with evaluation loops, not from parallelism itself; adding fan-out to the harness pattern roughly 1.5-2x the sequential harness cost.

The operator's hybrid architecture -- deterministic state machine core plus agentic skill execution shell -- is exactly the pattern the industry has converged on for production multi-agent systems. David Fetterman named it "deterministic core, agentic shell." Stately AI ships XState bindings for it. Every serious production deployment in 2025-2026 uses some version of it. Parallelism is not an architectural change for the operator. It is a scheduling optimization within the existing architecture. The state machine continues to own task ordering and transitions; the operator gains the ability to dispatch multiple independent transitions simultaneously.

1. Agent Teams Architecture

Agent Teams shipped in February 2026 as a research preview, gated behind CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1. One Claude Code session (the team lead) spawns independent teammate agents, each with its own context window and full tool access. Four components: a team lead, teammates, a shared task list (JSON files at ~/.claude/tasks/{team-name}/), and a per-agent mailbox system (JSON inboxes at ~/.claude/teams/{team-name}/inboxes/).

What makes Agent Teams fundamentally different from subagents is not scale but topology. Subagents are spokes reporting to a hub. Agent Teams are a mesh.

Peer-to-peer messaging -- teammates communicate directly, not just through the parent. This enables adversarial debugging, collaborative research, and self-organizing patterns that subagents structurally cannot support. Shared task list with dependency tracking -- tasks have pending/in-progress/completed states, file-locking for claims, and automatic unblocking when dependencies complete. Persistent teammates -- each teammate is a full Claude Code session that persists for the team's duration, unlike ephemeral subagent invocations. Quality gates via hooks -- TeammateIdle, TaskCreated, TaskCompleted hooks enable programmatic governance (e.g., requiring tests to pass before task closure). Plan approval workflow -- teammates can be required to plan before implementing, with the lead reviewing and approving plans.

The stability picture is honest and unflattering. Agent Teams works well for parallel research, independent module development, and competing-hypothesis debugging. It works badly for anything requiring reliability. Twenty documented issues span session management fragility, VS Code integration breakage, tmux race conditions, delegation compliance failures, and task status synchronization bugs. The feature remains behind an experimental flag with no announced GA timeline. The recommended sweet spot from community practice is 3-5 teammates with 5-6 tasks each.

Cost profile: A 3-agent team uses 3-4x tokens of a single session. Plan approval phases push this to ~7x. The cost is not the problem. The reliability is.

2. Worktree Isolation Mechanics

Git worktrees are how you make parallel code generation safe. They create independent working directories with their own branch, index, and files while sharing the same repository history. Everything else -- file partitioning, careful prompting, hoping agents stay in their lane -- is a brittle approximation of what worktrees give you by construction.

Three entry points:

Method	Use Case
`claude --worktree <name>`	User sessions
`isolation: worktree` in subagent frontmatter	Automated parallel code generation
`EnterWorktree` tool	Mid-session isolation

Worktrees are created at <repo>/.claude/worktrees/<name>/ on a branch named worktree-<name>, branching from origin/HEAD. Base branch selection is not configurable via flag -- requires git remote set-head or a WorktreeCreate hook. This is a real limitation the operator must work around.

Merge semantics: There is no automatic merge. This is a feature, not a missing feature. Worktree branches must be integrated manually via git merge, git cherry-pick, gh pr create, or by asking Claude in the main session. Conflicts surface only at merge time and use standard git conflict resolution. The team lead (or operator) coordinates sequential merging.

Cleanup: Worktrees with no changes are auto-removed when the subagent finishes. Changed worktrees persist for manual review. Orphaned worktrees are cleaned up at startup after 30 days (configurable) if they have no uncommitted changes, untracked files, or unpushed commits.

Monorepo considerations: Worktrees need dependency installation since node_modules is absent. worktree.symlinkDirectories symlinks specified directories from the main repo. worktree.sparsePaths uses git sparse-checkout for large monorepos. Known bug: Claude Code's atomic write pattern replaces symlinks with regular files -- tracked in issue #40857.

The constraint operators must internalize: Subagent results return as text summaries, not diffs or commits. The parent must explicitly interact with the worktree's branch to integrate changes. The operator needs explicit merge logic after parallel subagent completion. There is no shortcut here.

3. Parallel Write Safety

Parallel subagents in Claude Code are genuinely concurrent -- they fire simultaneously, not sequentially. Without isolation, they share the same working directory. This is where every naive "just run N agents" approach breaks.

Three findings, in order of severity:

1. Concurrent writes to the same file produce silent corruption. Last-write-wins at the OS level. The Edit tool's string replacement fails unpredictably when line counts change between agents. No warnings, no errors, no conflict markers. The file is simply wrong, and nobody tells you.

2. Git index lock contention causes commit failures. Git's .git/index.lock mutex means concurrent git add or git commit calls fail with fatal: Unable to create '.git/index.lock': File exists -- even when agents write to completely disjoint files. This is not a Claude Code bug. It is how git works. Issue #28823 documents the pattern.

3. Claude Code provides no implicit safety net. No filesystem-level locking. No copy-on-write. No transaction semantics. No warnings when you fire multiple subagents at the same working directory. The system trusts you to know what you are doing.

Five sanctioned patterns for safe parallel code generation:

Pattern	How It Works	When to Use
Strict file partitioning	Each agent owns disjoint files; commits serialized after completion	Different modules, no shared files
Worktree isolation	Each agent gets its own branch/directory/index	Any overlapping writes
Read-only parallel, sequential writes	Parallel research with read-only tools; parent applies changes	Multiple perspectives on same files
File-based coordination with post-merge	Agents in separate clones; git sync forces conflict resolution	Large-scale (Anthropic's 16-agent C compiler approach)
Agent Teams with shared task list	Teammates in own worktrees; file-locked task claims	Complex multi-agent coordination

The lesson from Anthropic's own 16-agent C compiler build is that even Anthropic does not trust its agents to share a working directory. They used separate clones with git sync. The operator should not be more trusting than its maker.

4. Orchestration Patterns

The orchestration landscape divides into three categories, and the third is eating the other two.

State-machine-driven orchestration -- what the operator currently does -- provides explicit transitions, full auditability, and testable-without-LLM determinism, but cannot adapt to unforeseen situations. LLM-driven orchestration handles unstructured problems but compounds failure rates: a 10-step process at 99% per-step reliability delivers 90.4% overall reliability. Stack ten agents and the system fails roughly once in ten runs. The hybrid pattern -- deterministic core with agentic shell -- is the dominant 2025-2026 production architecture, and the operator already implements it.

Anthropic's own research delivers the sharpest indictment of pure multi-agent orchestration: 57% of multi-agent project failures originate in orchestration design, not agent capability. The synthesis/fan-in step is where systems die -- an aggregator without explicit merge policies produces bloated or arbitrary output. The industry is learning what the operator already knows: the machine must own the rules.

Cross-framework fan-out/fan-in comparison:

System	Fan-Out	Fan-In	Key Differentiator
Claude Code subagents	Multiple Agent tool calls in same message	Parent reads return strings, synthesizes	Simplest; no inter-child communication
Claude Code Agent Teams	Team lead spawns teammates + shared task list	Lead synthesizes; peers message each other	Peer-to-peer communication
LangGraph	`Send API` to multiple nodes	Reducer functions merge typed state	Structured state with explicit reducers
OpenAI Agents SDK	`asyncio.gather()` for parallel calls	Manager combines outputs	Handoffs + agents-as-tools
Google ADK	`ParallelAgent` named type	Synthesizer agent post-parallel	Most explicit primitives
Anthropic Managed Agents	Multiple stateless brains on same session	Shared event log	Brain/Hands/Session decoupling

The XState/Stately Agent pattern maps directly to the operator's design. The machine enforces rules while the agent handles creative work. Two bridging tools -- get_current_state and take_action -- translate between the agent and machine domains. David Fetterman's "Deterministic Core, Agentic Shell" blog post names the pattern the operator has been running. The operator is not an experiment. It is the architecture the industry converged on.

5. Token Cost Implications

Cost scaling is multiplicative, not additive. This is the fact that every "just parallelize it" proposal must confront honestly.

N parallel subagents cost N-times, plus coordination overhead. The overhead is not fixed -- it depends on which primitive you use and how you manage the prompt cache.

Scenario	Relative Token Cost	Wall-Clock Speed	Cache Efficiency
Single agent, sequential	1.0x (baseline)	Slowest	Best
N subagents, parallel	~Nx + overhead	Fastest	Poor (parallel cold-starts)
N subagents, staggered	~Nx, better cache	Moderate	Good (serial cache hits)
Agent team (3 members)	3-7x	Fast	Independent per agent
Agent team (5 members)	5-15x	Fastest	Independent per agent
Sequential multi-agent harness	10-22x	Slow but thorough	Moderate

Prompt cache behavior is the hidden cost multiplier that nobody talks about. Parallel agents cold-start independently rather than sharing cache. Five parallel subagents each processing 100K tokens of shared system prompt pay ~3.8x more than sequential execution on that prefix alone ($3.125 vs. $0.825). Staggering launches -- waiting for the first response before firing subsequent agents -- restores cache benefits but sacrifices the wall-clock speed that was the entire point of parallelizing. This is a genuine tradeoff with no free lunch.

Model tiering is where the real savings live:

Role	Recommended Model	Cost vs. Opus
Orchestrator / complex reasoning	Opus 4.6	1.0x
Code generation / coordination	Sonnet 4.6	0.6x
Exploration / search / simple tasks	Haiku 4.5	0.2x

A 40-50% cost reduction from model tiering alone. Routing exploration subagents to Haiku at $0.25/$1.25 per million tokens versus Opus at $15/$75 is not an optimization. It is the difference between sustainable and ruinous.

Practical strategies for the operator:

Keep spawn prompts minimal -- everything in the prompt inflates every subagent's context
Cap extended thinking (MAX_THINKING_TOKENS=10000) for routine tasks
Move stable instructions to skills (loaded on-demand vs. CLAUDE.md loaded at session start)
Clean up agent teams promptly -- idle teammates still consume tokens via polling
Use subagents over Agent Teams for focused tasks (lower overhead, higher reliability)

Recommendation Matrix

Which Primitive for Which Use Case

The operator's primitive selection is not a preference. It is a safety decision.

Use Case	Primitive	Isolation	Model	Risk Level
Parallel research / analysis (no writes)	Subagents	None needed	Haiku	Low
Parallel code generation, disjoint files	Subagents	File partitioning	Sonnet	Low-Medium
Parallel code generation, possible overlap	Subagents	Worktree	Sonnet	Medium
QA on multiple completed tasks	Subagents	Worktree (read from branches)	Sonnet	Low
Complex multi-step with inter-agent discussion	Agent Teams	Worktree (implicit)	Mixed	High (experimental)
Single-file changes from multiple perspectives	Sequential subagents	None	Sonnet	Low
Large-scale parallel (10+ agents)	Manual worktrees or Docker containers	Full isolation	Mixed	Medium-High

Primitive Comparison for Operator Design

The question is not which primitive is "best." The question is which primitive the operator can trust with its reliability contract.

Dimension	Subagents	Agent Teams	Manual Worktrees
Readiness	Production	Experimental	Production
Coordination	Parent manages all	Self-coordinating	Human manages
Communication	Results to parent only	Peer-to-peer + task list	None
Cost	N-times + overhead	3-7x per 3 agents	Variable
Operator integration	Natural (Agent tool calls)	Requires env var opt-in	Manual orchestration
Reliability	High	Medium (known issues)	High
Session resumption	N/A (ephemeral)	Not supported	N/A
Merge burden	Parent merges worktree branches	Lead merges	Human merges

Subagents with worktree isolation are the only combination that is both production-ready and naturally integrates with the operator's Agent tool calls. Agent Teams becomes interesting when -- and only when -- it exits experimental status and gains session resumption. Manual worktrees are the escape hatch for anything subagents cannot handle.

What Can Be Built Now vs. What Needs Experimentation

Build Now

These use production-ready primitives and well-understood patterns. No open research questions block them.

Dependency graph analysis before dispatch. Parse tasks.md for independent task sets. Tasks with no shared dependencies can be fanned out in a single Agent tool response. The state machine already knows the dependency graph. Use it.
Subagent fan-out for independent tasks with worktree isolation. Each parallel subagent gets isolation: worktree. The operator fires multiple Agent tool calls in one message. Results return as text summaries; the operator merges worktree branches sequentially. This is the bread-and-butter pattern.
Read-only parallel research. Subagents with tools restricted to Read, Grep, Glob can safely run in parallel without isolation. No writes, no races. Use for codebase analysis, audit, or information gathering phases.
Model tiering for cost optimization. Route exploration subagents to Haiku, implementation to Sonnet, keep Opus for the operator itself. This is not optional if the operator runs at any scale.
Sequential commit serialization. After parallel subagents complete (in their worktrees), the operator merges branches one at a time. Git index lock contention is structurally impossible when merges are serial.
Circuit breaker per parallel branch. Each parallel subagent's failure is isolated. If one fails, the operator retries that task independently without affecting completed parallel tasks. The circuit breaker pattern the operator already owns extends naturally to parallel branches.

Needs Experimentation

These are feasible but unvalidated in the operator context. Each carries an empirical question that cannot be answered by reading docs.

Fan-in merge policies. When parallel tasks return, what happens if one fails and others succeed? What if results conflict? The operator needs explicit rules. "2 of 3 succeeded" is not a state the current machine represents. These policies must be designed, implemented, and stress-tested with real runs.
Worktree merge conflict resolution at scale. How often do tasks the operator categorizes as "independent" actually produce merge conflicts? How well does Claude resolve them? Nobody knows. This needs empirical data from real operator runs, not assumptions.
Staggered subagent launches for cache optimization. Waiting for the first subagent's response before launching subsequent ones restores cache benefits. But it sacrifices wall-clock time -- the one thing parallelism was supposed to buy. The crossover point where staggering beats full parallelism depends on task duration and prompt size, both of which vary.
Agent Teams as an operator execution backend. The operator could spawn a team for complex features instead of sequential skill invocations. But Agent Teams' experimental status, session management fragility, and delegation compliance issues need stress-testing before this is viable. The operator's reliability contract is non-negotiable.
Dynamic isolation selection. Can the operator detect file overlap before dispatching and choose between file partitioning (cheaper) and worktree isolation (safer)? This requires analyzing task descriptions to predict file sets -- which is asking an LLM to predict another LLM's behavior. Approach with skepticism.
Parallel state machine transitions. The XState machine needs to support concurrent transitions for tasks in different states. This may require machine redesign (parallel states or multiple active state nodes). The machine currently assumes one active transition at a time. Changing this touches the operator's core invariant.
Cost-bounded parallelism. Set per-agent token limits and abort parallel branches that exceed budget. The right thresholds are unknown. Too low and you get premature termination on legitimate work. Too high and you have no bound at all.

Open Questions

These are the questions that do not have answers in the documentation, the community, or Anthropic's own engineering posts. They require building, measuring, and deciding.

How does the XState machine represent concurrent execution? Parallel states? Multiple active nodes? External tracking of subagent handles? The current machine assumes sequential transitions. Every answer here changes the operator's core contract.
What is the real-world merge conflict rate for "independent" tasks? The operator categorizes tasks as independent based on dependency analysis. How often is it wrong? Is file partitioning sufficient for a typical feature build, or does worktree isolation always pay for itself? This determines whether the cheaper option is viable.
What does partial fan-in failure look like? Two of three parallel tasks succeed and one fails. Advance the successes and retry the failure? Roll back all three? The answer depends on whether the successful tasks' outputs are valid in isolation -- and that depends on the specific feature, not on any general rule.
What is the cost-optimal parallelism factor? At what N does cache miss overhead outweigh wall-clock savings? For five agents with 100K shared prefix, the cache penalty is 3.8x. For ten agents, it is worse. The crossover point depends on task duration, and nobody has published measurements.
Can the circuit breaker extend to parallel execution? Trip the circuit breaker for the entire parallel batch if the failure rate exceeds threshold? Or per-branch only? The operator's existing circuit breaker was designed for sequential execution. The semantics under parallelism are genuinely ambiguous.
Staggered or full parallel? Staggering preserves cache. Full parallel preserves speed. The right answer depends on whether you are cost-constrained or time-constrained, and that changes per run. Is this a runtime decision the operator should make dynamically, or a configuration choice?
What is the minimum viable task duration for worktree overhead? Worktree creation and teardown are not free. For a 30-second subagent task, worktree overhead may dominate execution time. Where is the breakpoint below which you should just use file partitioning?
When will Agent Teams exit experimental? Anthropic shipped Managed Agents in April 2026 as the production multi-agent offering. Will Agent Teams remain the local/power-user feature? Will Managed Agents subsume it? The answer determines whether investing in Agent Teams integration is a bet on the platform or a bet against it.

Sources

Anthropic Official Documentation

Anthropic Engineering & Research

Community & Ecosystem

External Frameworks & Patterns

GitHub Issues

Raw

subtask-1.md

How Do Agent Teams Work Internally?

Research date: 2026-04-12 Status: Complete

Overview

Agent Teams is an experimental feature shipped in Claude Code v2.1.32 (February 2026) alongside Opus 4.6. It enables one Claude Code session (the "team lead") to spawn independent teammate agents, each with its own context window and full tool access. Teammates communicate peer-to-peer through a mailbox system and coordinate through a shared task list — fundamentally different from subagents, which only report results back to a parent.

Enabled via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in environment or settings.json.

Architecture

Four components compose an agent team:

Component	Role
Team Lead	Main Claude Code session. Creates team, spawns teammates, coordinates work, synthesizes results. Cannot be transferred.
Teammates	Separate Claude Code instances with independent context windows. Load same project context (CLAUDE.md, MCP servers, skills) but NOT the lead's conversation history.
Task List	Shared work items stored at `~/.claude/tasks/{team-name}/`. JSON files with dependency tracking.
Mailbox	Per-agent inbox files at `~/.claude/teams/{team-name}/inboxes/{agent}.json`. Async message delivery.

Team config lives at ~/.claude/teams/{team-name}/config.json — runtime state (session IDs, tmux pane IDs, member list). Auto-generated; editing by hand is overwritten.

Team Config Structure

{
  "name": "string",
  "leadAgentId": "string@team",
  "members": [
    {
      "agentId": "string@team",
      "name": "string",
      "agentType": "string",
      "backendType": "in-process|tmux|iterm2",
      "model": "haiku|sonnet|opus",
      "planModeRequired": boolean,
      "cwd": "string"
    }
  ]
}

Environment Variables Auto-Set for Teammates

CLAUDE_CODE_TEAM_NAME
CLAUDE_CODE_AGENT_ID
CLAUDE_CODE_AGENT_NAME
CLAUDE_CODE_PLAN_MODE_REQUIRED

The Seven Internal Tools

Agent Teams are built from seven tools that Claude can call:

Tool	Purpose
TeamCreate (`spawnTeam`)	Initializes team namespace, creates `config.json` and directory structure, establishes leader
TaskCreate	Writes JSON task file to `~/.claude/tasks/{team-name}/` with description, status, owner, dependencies
TaskUpdate	Claim tasks (`status: "in_progress"`), mark completion, assign owner, declare dependencies via `addBlockedBy: [taskIds]`
TaskList	Returns all tasks with current status; teammates poll this to find unclaimed work
TaskGet	Fetch single task details
SendMessage	Peer-to-peer messaging with multiple message types (see below)
TeamDelete (`cleanup`)	Removes team config and task files after all teammates shut down

Additional operations exist on the TeammateTool: discoverTeams, requestJoin/approveJoin/rejectJoin, requestShutdown/approveShutdown/rejectShutdown, approvePlan/rejectPlan.

Shared Task List Semantics

Task States

pending --> in_progress --> completed

How Tasks Are Created

The lead creates tasks via TaskCreate. Each task is a JSON file at ~/.claude/tasks/{team-name}/N.json with fields: ID, subject, description, status, owner, dependencies.

How Tasks Are Claimed

Two modes:

Lead assigns: Lead explicitly tells a specific teammate to take a task.
Self-claim: After finishing a task, a teammate calls TaskList(), finds unclaimed/unblocked tasks, and claims one via TaskUpdate (sets status: "in_progress" and owner field).

Race condition prevention: Task claiming uses file locking to prevent multiple teammates from claiming the same task simultaneously.

Dependency Tracking

Tasks can declare dependencies via addBlockedBy: [taskIds]. A pending task with unresolved dependencies cannot be claimed. When a blocking task completes, dependent tasks auto-unblock without manual intervention. Work executes in waves based on dependency chains.

Known Issue: Status Lag

Teammates sometimes fail to mark tasks as completed, which blocks dependent tasks. Workaround: manually update task status or tell the lead to nudge the teammate. (This is documented as a known limitation.)

Mailbox / Messaging System

Architecture

Each teammate has a dedicated inbox file at ~/.claude/teams/{team-name}/inboxes/{agent}.json. Messages are JSON objects written to these files. Delivery is automatic — the lead doesn't need to poll.

Message Types

Type	Structure	Sender	Purpose
Regular message	`{from, text, timestamp, read}`	Any agent	Standard communication
broadcast	Same as regular, sent to all	Any agent	Team-wide announcements
shutdown_request	`{type, requestId, from, reason, timestamp}`	Leader	Ask teammate to shut down
shutdown_approved	`{type, requestId, from, paneId, backendType}`	Teammate	Confirm shutdown
idle_notification	`{type, from, timestamp, completedTaskId}`	Teammate	Signal work complete
task_completed	`{type, from, taskId, taskSubject}`	Teammate	Task done notification
plan_approval_request	`{type, from, requestId, planContent}`	Teammate (plan mode)	Submit plan for review
permission_request	`{type, requestId, workerId, toolName, description, input}`	Teammate	Bubble up permission ask
join_request	`{type, proposedName, requestId, capabilities}`	Requestor	Request to join team

Broadcast vs. Direct

Direct (write): Send to one specific teammate by name. Cheap, targeted.
Broadcast: Send to ALL teammates simultaneously. Expensive — costs scale linearly with team size (one message per member context window). Use sparingly.

Ordering Guarantees

Messages are written to JSON inbox files with timestamps. No strong ordering guarantees are documented beyond timestamp-based sequencing. This is an async system, not a real-time channel.

Team Lead vs. Teammate Roles

Team Lead

Creates the team and spawns teammates
Assigns tasks (or lets teammates self-claim)
Approves/rejects plans when planModeRequired is set
Synthesizes results across all teammates
Initiates shutdown and cleanup
Fixed for team lifetime — cannot transfer leadership
Can enter delegate mode (Shift+Tab) to restrict itself to coordination-only tools (no code touching)

Teammates

Full independent Claude Code sessions with own context window
Load project context (CLAUDE.md, MCP servers, skills) but NOT lead's conversation history
Can message any other teammate by name (peer-to-peer)
Can self-claim tasks from the shared task list
Can reject shutdown requests with explanation
Can use subagent definitions as role templates (tools allowlist, model, instructions)
Cannot spawn their own teams or teammates (no nesting)
Team coordination tools (SendMessage, task management) always available even when subagent tools restricts other tools

Permissions

All teammates inherit the lead's permission settings at spawn time. If lead uses --dangerously-skip-permissions, all teammates do too. Can change individual modes after spawn, but not at spawn time.

Display Modes / Spawn Backends

Mode	How It Works	Requirements
in-process	Same Node.js process, hidden panes. Navigate with Shift+Down.	Any terminal. Default outside tmux.
tmux	Separate tmux panes, visible simultaneously.	tmux installed.
iterm2	Split panes in iTerm2.	iTerm2 + `it2` CLI + Python API enabled.

Auto-detection: checks $TMUX env var, then $TERM_PROGRAM for iTerm2, then falls back to in-process.

Configure globally in ~/.claude.json:

{ "teammateMode": "in-process" }

Or per-session: claude --teammate-mode in-process

Quality Gates: Hooks

Three hook events for team governance:

Hook	When It Fires	Exit Code 2 Effect
TeammateIdle	Teammate about to go idle	Send feedback, keep teammate working
TaskCreated	Task being created	Prevent creation with feedback
TaskCompleted	Task being marked complete	Prevent completion with feedback

Use cases: enforce test passing before task closure, require lint success, auto-assign follow-up tasks.

Plan Approval Workflow

Teammate spawned with planModeRequired: true
Teammate works in read-only plan mode — can explore but not modify
Teammate submits plan_approval_request to lead
Lead reviews, approves or rejects with feedback
If rejected: teammate revises and resubmits
If approved: teammate exits plan mode, begins implementation

Lead makes approval decisions autonomously. Influence via prompt: "only approve plans that include test coverage."

Limitations (Complete List)

Documented Limitations

No session resumption: /resume and /rewind do not restore in-process teammates. Lead may try to message dead teammates. Workaround: tell lead to spawn new ones.
No nested teams: Teammates cannot spawn their own teams or teammates.
One team per session: Clean up current team before starting a new one.
Lead is fixed: Cannot promote a teammate to lead or transfer leadership.
Task status can lag: Teammates sometimes forget to mark tasks completed, blocking dependents.
Shutdown can be slow: Teammates finish current request/tool call before exiting.
Permissions set at spawn: All teammates start with lead's mode. No per-teammate modes at spawn time.
Split panes limited: Not supported in VS Code integrated terminal, Windows Terminal, or Ghostty.
No project-level team config: .claude/teams/teams.json in project dir is NOT recognized as configuration.
Subagent skills and mcpServers frontmatter not applied when running as teammate.

Undocumented / Community-Discovered Issues

VS Code extension: tools unavailable — TeammateTool, SendMessage, spawnTeam not available in VS Code extension even with env var set (#28048).
VS Code: message delivery broken — Messages not delivered to team lead; permission prompts invisible, causing deadlock (#25254).
tmux race condition — send-keys fires before shell is ready in new pane, teammates fail to start (#40168).
IPC socket hang — Lead session hangs indefinitely when unix socket peer disconnects, no timeout (#33043).
CLAUDE_CONFIG_DIR not inherited — Spawned teammates don't respect config dir, fail to share task list (#23676).
Shift+Up/Down auto-sends — Pre-filled suggestion text sent to wrong agent when switching panes (#26511).
Bedrock model mismatch — Teammates spawned with non-Bedrock model ID via --model flag on Bedrock (#23561).
Memory leak (fixed) — Completed teammate tasks never garbage collected from session state.
Model overrides delegation — Claude ignores mandatory delegation triggers in CLAUDE.md, performs work inline "for efficiency" even when Agent Teams is enabled (#42856). Governance hooks (PreToolUse) don't fire in Agent tool subagents, so bypassing delegation breaks the entire governance architecture.
TaskUpdate status not synced between team and session task lists (#23629).

Stability Assessment

Experimental Status

Agent Teams shipped as a research preview in v2.1.32. The feature is gated behind an environment variable and has a prominent "experimental" warning in docs. It is NOT considered production-ready by Anthropic.

What Works Well

Parallel research/review tasks with clear boundaries
Competing hypothesis debugging (adversarial pattern)
Independent module development where teammates own separate file sets
The C compiler project (16 agents, 100K lines of Rust, compiled Linux kernel) validated the core architecture at scale

What's Fragile

Session management: No resume, no rewind, dead teammates after disconnect
VS Code integration: Multiple blocking issues make it effectively broken in VS Code
tmux spawning: Race conditions in pane initialization
Task tracking reliability: Status lag is a known, unresolved issue
Delegation compliance: Model sometimes ignores team structure and does work inline
Cost control: Each teammate is a full context window; 3-agent team ~3-4x tokens; plan approval phases ~7x tokens

Changelog Signal

v2.1.32: Initial release (research preview)
v2.1.33: tmux messaging fix, TeammateIdle/TaskCompleted hooks added
Subsequent versions: Memory leak fix, Bedrock/Vertex/Foundry compatibility
v2.1.101: Permission inheritance fix, /team-onboarding command

Active bug fixes suggest ongoing investment, but the pace is incremental rather than rapid. The feature remains behind an experimental flag with no announced timeline for GA.

Roadmap Signal

Managed Agents (launched April 8, 2026 in public beta) is Anthropic's production-grade multi-agent offering — suggests Agent Teams may remain a power-user/local feature
Community requests for role-based model selection (lead=Opus, workers=Sonnet, tests=Haiku) not yet addressed
No announced plans for session resumption, nested teams, or leadership transfer

Cost Profile

Scenario	Token Multiplier
3 teammates, 30 min	~3-4x single session
Plan approval phases	~7x standard
5-6 tasks per teammate	Recommended sweet spot
16 agents (C compiler scale)	$20K over 2 weeks, 2B input + 140M output tokens

Recommended team size: 3-5 teammates for most workflows. Having 5-6 tasks per teammate keeps everyone productive.

Key Architectural Insight

Agent Teams is fundamentally a file-system-based coordination protocol. Tasks are JSON files. Inboxes are JSON files. Team config is a JSON file. File locking prevents races. This makes the system inspectable, debuggable, and hackable — but also means it inherits file system limitations (no real-time guarantees, potential for stale reads, platform-dependent locking behavior).

The peer-to-peer messaging model (vs. hub-and-spoke in subagents) is the core architectural differentiator. It enables patterns like adversarial debugging, collaborative research, and self-organizing swarms that subagents cannot support.

Sources

Raw

subtask-2.md

title	Subtask 2: Worktree Isolation Mechanics in Claude Code
date	2026-04-12
parent	claude-code-parallel-primitives
status	done

Summary

Claude Code provides first-class git worktree support for isolating parallel sessions and subagents. Worktrees create independent working directories with their own branch, index, and files while sharing the same repository history. There are three entry points: the --worktree CLI flag for user sessions, the isolation: worktree subagent frontmatter field, and the EnterWorktree tool available mid-session. Agent Teams (experimental) use worktrees implicitly for each teammate. Merge is manual by design -- Claude creates branches but the user (or a coordinating agent) decides how to integrate them. Cleanup is automatic for unchanged worktrees and prompt-driven for changed ones.

1. Worktree Creation

Entry points

Method	Who uses it	Behavior
`claude --worktree <name>` / `claude -w <name>`	User at CLI	Creates worktree, starts interactive session inside it
`claude --worktree` (no name)	User at CLI	Auto-generates random name (e.g., "bright-running-fox")
`isolation: worktree` frontmatter	Custom subagent definition	Each invocation of that subagent auto-creates a worktree
"use worktrees for your agents"	Natural language prompt	Claude applies worktree isolation to subagent invocations
`EnterWorktree` tool	Mid-session (user says "work in a worktree")	Creates worktree and switches session's cwd into it

Filesystem layout

Worktrees are created at:

<repo>/.claude/worktrees/<name>/

The branch is named worktree-<name> (e.g., EnterWorktree(name="invoice-pdf-export") creates branch worktree-invoice-pdf-export).

Best practice: add .claude/worktrees/ to .gitignore to prevent worktree contents from appearing as untracked files.

Base branch selection

The worktree branches from the default remote branch, which is wherever origin/HEAD points. This reference is set once during git clone and is not automatically updated if the remote's default branch changes.

To re-sync: git remote set-head origin -a To override: git remote set-head origin your-branch-name

The base branch is not configurable through any Claude Code flag or setting. For per-invocation control, use a WorktreeCreate hook (see section 5).

.worktreeinclude -- copying gitignored files

Git worktrees are fresh checkouts and do not include untracked files. To copy gitignored files (.env, .env.local, secrets) into new worktrees, create a .worktreeinclude file in the project root using .gitignore syntax:

.env
.env.local
config/secrets.json

Rules: only files matching a pattern AND also gitignored get copied. Tracked files are never duplicated. This applies to --worktree, subagent worktrees, and Desktop app parallel sessions.

Note: .worktreeinclude is not processed when a custom WorktreeCreate hook is configured -- the hook replaces default behavior entirely, so copy files inside the hook script.

2. Parallel Agent Behavior

Subagent worktrees

When a subagent has isolation: worktree, each invocation gets its own independent worktree with its own branch. Multiple subagents fired in parallel each get separate worktrees -- they are fully independent:

Separate branch (worktree-<auto-name>)
Separate working directory
Separate git index
No shared state between them

Agent A can rewrite src/auth.ts while Agent B rewrites the same file in a different worktree. There is no coordination or conflict at the filesystem level.

Subagent results returned to parent

Subagents return a text summary to the parent agent. The parent does not automatically receive the diff or commit -- it gets the subagent's final text output. The worktree (with its branch and commits) persists on disk if changes were made, but the parent must explicitly interact with the branch (merge, cherry-pick, etc.) if it wants to integrate the changes.

Agent Teams

Agent Teams (experimental, requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1) give each teammate its own context window. The documentation says teammates work independently, each in its own worktree. The team lead coordinates and can merge results sequentially.

Key differences from subagent worktrees:

Teammates can message each other directly (subagents cannot)
Teammates have a shared task list with claim/dependency semantics
Teammates are full Claude Code sessions, not transient subagents
The lead merges results, similar to how developers work on separate branches and merge through PRs

3. Merge Semantics

No automatic merge

Claude Code does not automatically merge worktree branches back to the base branch. This is by design. The merge workflow is:

Subagent/teammate works in its worktree, makes commits on its worktree-<name> branch
When done, the worktree persists (if changes exist)
The user or coordinating agent decides how to integrate:
- git merge
- git cherry-pick
- Create a PR with gh pr create
- Ask Claude to merge in the main session

Conflict handling

Since each worktree has its own branch, conflicts surface only at merge time. There is no built-in conflict resolution -- standard git merge conflict resolution applies. Prevention strategies from the ecosystem:

Scope each agent's work tightly (different files/modules)
Merge from main frequently to keep worktree branches up to date
Use the team lead to coordinate sequential merging

Agent Teams merge pattern

The team lead merges results sequentially -- conceptually similar to reviewing and merging PRs one at a time. Claude is reportedly good at handling merge conflicts when given gh CLI access and PR context. But this is not automated infrastructure; it depends on the lead's prompting and tool access.

4. Cleanup Behavior

User worktrees (--worktree)

On session exit:

No changes: worktree and branch removed automatically
Changes/commits exist: Claude prompts the user to keep or remove. Keeping preserves directory and branch. Removing deletes everything, discarding uncommitted changes and commits.

User-created worktrees (--worktree) are never removed by the automatic cleanup sweep.

Subagent worktrees

No changes: cleaned up automatically when the subagent finishes
Changes exist: worktree persists on disk for manual review

Orphaned subagent worktrees

Subagent worktrees orphaned by a crash or interrupted parallel run are removed automatically at startup, subject to:

Older than cleanupPeriodDays setting (default: 30 days, minimum: 1)
No uncommitted changes
No untracked files
No unpushed commits

If any of those conditions fail, the orphaned worktree is preserved.

ExitWorktree tool

The ExitWorktree tool provides two actions:

"keep": leaves worktree on disk for manual inspection
"remove": deletes worktree with safety checks for uncommitted changes/new commits

Known bugs (as of April 2026)

v2.1.101 fixed: claude -w <name> failing with "already exists" after a previous session's cleanup left a stale directory
v2.1.92 fixed: stale subagent worktree cleanup removing worktrees that contain untracked files
v2.1.89 fixed: subagents with worktree isolation leaking their working directory back to the parent session's Bash tool

5. Hooks: WorktreeCreate and WorktreeRemove

WorktreeCreate

Fires when: --worktree flag or isolation: "worktree" triggers worktree creation. Effect: replaces default git worktree logic entirely.

Input schema:

{
  "session_id": "abc123",
  "hook_event_name": "WorktreeCreate",
  "worktree_path": "/path/to/worktree",
  "source_path": "/path/to/source/repo",
  "branch": "feature-branch"
}

Output requirement: must print the absolute worktree path to stdout on exit 0. Failure: any non-zero exit code aborts worktree creation (unlike other hooks where only exit code 2 blocks). No matcher support -- always fires on every worktree creation.

WorktreeRemove

Fires when: worktree is being removed (session exit or subagent finish). Effect: cannot block removal -- failures are logged in debug mode only.

Input schema includes reason field: "session_exit" or "subagent_finish". No matcher support.

Use cases for hooks

Non-git VCS (SVN, Perforce, Mercurial): implement custom checkout/cleanup
Custom base branch per invocation
Database isolation (creating per-worktree DB copies)
Audit trail logging
Custom dependency installation

6. Monorepo Considerations

Dependencies (node_modules)

Git worktrees are fresh checkouts -- node_modules is not present. Each worktree needs its own dependency installation unless mitigated:

Setting: worktree.symlinkDirectories

{
  "worktree": {
    "symlinkDirectories": ["node_modules", ".cache"]
  }
}

Symlinks specified directories from the main repo into each worktree. Avoids duplicating large directories.

Known bug: Claude Code's atomic write pattern (write temp -> rename) replaces symlinks with regular files when it writes to a symlinked file. A PreToolUse hook workaround exists that redirects writes to the symlink target.

Setting: worktree.sparsePaths

{
  "worktree": {
    "sparsePaths": ["packages/my-app", "shared/utils"]
  }
}

Uses git sparse-checkout (cone mode) to write only listed directories to disk. Useful when a task needs only a subset of a large monorepo.

Build caches

Build caches (.cache, .next, .turbo, etc.) are not shared by default. Options:

Include in symlinkDirectories (risk: cache conflicts between parallel agents)
Let each worktree rebuild (safe but slow)
Use a WorktreeCreate hook for custom cache setup

pnpm global store

pnpm's global virtual store is particularly well-suited: each worktree's node_modules contains only symlinks into a single content-addressable store. Adding a new worktree is fast and costs almost no extra disk space.

Vite/Vitest gotcha

If using Vite or Vitest, add .claude/worktrees/** to excluded paths for tests, otherwise file watchers across all worktrees will spike CPU.

7. Key Constraints and Limitations

Worktree sessions are transient: cannot be resumed via claude --resume. The session and its worktree are coupled to a single lifecycle.
No nested worktrees: cannot call EnterWorktree from inside an existing worktree.
Subagents cannot spawn subagents: a worktree-isolated subagent cannot itself fire subagents with their own worktrees.
Base branch not configurable via flag: must use git remote set-head or a WorktreeCreate hook.
No auto-merge: branches must be integrated manually or via coordinating agent.
symlinkDirectories write bug: atomic writes replace symlinks with regular files.
Agent Teams experimental: one team per session, no session resumption, no nested teams.

8. Decision Matrix for Operator Design

Scenario	Recommended approach
Parallel read-only research	Subagents without worktrees (no filesystem conflicts)
Parallel code generation, different files	Subagents with `isolation: worktree`
Parallel code generation, overlapping files	Subagents with worktree + post-merge by parent
Complex multi-step with inter-agent communication	Agent Teams (experimental)
Monorepo with large deps	Worktree + `symlinkDirectories` + pnpm if possible
Non-git VCS	WorktreeCreate/WorktreeRemove hooks
Custom base branch per task	WorktreeCreate hook

Sources

Raw

subtask-3.md

title	Can parallel subagents safely write code to the same branch?
date	2026-04-12
parent	claude-code-parallel-primitives
subtask	3
status	complete

Summary

Parallel subagents in Claude Code are genuinely concurrent -- they are not queued. They share the same working directory and filesystem by default. Without worktree isolation, concurrent code writes to the same branch are unsafe: file write races produce silent corruption, git index lock contention causes commit failures, and overlapping edits lead to data loss. The only sanctioned pattern for safe parallel code generation on a shared branch is strict file partitioning (each agent owns disjoint files). For anything beyond that, worktree isolation is required.

Finding 1: Subagents are truly concurrent, not queued

When Claude invokes the Agent tool (formerly Task tool) multiple times in a single response, those subagents fire concurrently. Wall-clock execution time equals the slowest subagent, not the sum.

Key evidence:

Official SDK docs state: "Multiple subagents can run concurrently, dramatically speeding up complex workflows" (Subagents in the SDK)
Technical analysis confirms: "Three subagents that each take thirty seconds finish in thirty seconds. Not ninety." (Medium: How the Task Tool Actually Distributes Work)
Each subagent is a separate API call running in its own context window. The parent agent does not wait for one to finish before starting the next.

The concurrency is at the process/API-call level. Claude decides when to parallelize based on task independence -- the developer defines the capability via agent definitions, not the scheduling.

Finding 2: Parallel subagents share the same working directory by default

Without isolation: worktree, all subagents operate on the same filesystem checkout as the parent session. There is no implicit sandboxing.

Key evidence:

Official docs: "A subagent starts in the main conversation's current working directory" (Create custom subagents)
The cd command does not persist between Bash calls within a subagent, but the base working directory is shared across all concurrent subagents.
Technical analysis: "Subagents inherit the parent session's tool permissions, including filesystem access."

Implication: two subagents spawned in the same turn can read and write the same files simultaneously. There is no filesystem-level locking, no copy-on-write, and no transaction semantics.

Finding 3: Concurrent writes to the same file produce silent corruption

This is the critical safety finding. When multiple subagents write to the same file concurrently, the result is nondeterministic garbage. There are no warnings, no errors, no conflict markers.

Key evidence:

"Multiple subagents writing to the same file concurrently will produce garbage. Race conditions are real here and they fail silently." (Medium: How the Task Tool Actually Distributes Work)
Best practices docs: "Parallel only works when agents touch different files." (Claude Code Sub-Agents: Parallel vs Sequential Patterns)
Agent teams docs: "Two teammates editing the same file leads to overwrites. Break the work so each teammate owns a different set of files." (Agent Teams)

The failure mode is last-write-wins at the OS level. The Edit tool does string replacement, so if Agent A's edit changes line counts before Agent B's edit executes, Agent B's old_string match may fail or match the wrong location.

Finding 4: Git index lock contention causes commit failures

Git uses .git/index.lock as a mutex. When two processes attempt to stage or commit simultaneously, the second process fails with fatal: Unable to create '.git/index.lock': File exists.

Key evidence:

GitHub issue #28823: Race condition with git index.lock during lint-staged pre-commit failures. Claude Code sees a failure, immediately retries, and hits the lock file.
Git's design: only one process can hold the index lock at a time. Concurrent git add or git commit calls will fail.
From lint-staged experience: "When chunked files are split up into enough groups the chance of a race condition on calling git add increases."

Even if two subagents write to completely disjoint files, they cannot safely run git add and git commit at the same time on the same checkout. The git index is a single shared resource.

Finding 5: Known patterns for safe parallel code generation

Pattern 1: Strict file partitioning (works without worktrees)

Each subagent is assigned exclusive ownership of specific files/directories. No two agents touch the same file. Commits are serialized after all agents complete.

"Parallel dispatch requires 3+ unrelated tasks with no shared state between tasks and clear file boundaries with no overlap."
Domain-based example: Frontend agent owns src/components/**, Backend agent owns src/api/**, Database agent owns src/db/**.
Limitation: agents cannot touch shared configuration files, routing tables, barrel exports, or any other "hotspot" files.

Pattern 2: Worktree isolation (recommended for code writes)

Set isolation: worktree in the subagent frontmatter. Each subagent gets its own branch, working directory, and git index.

"With worktree isolation, each agent has the entire codebase to itself." (Claude Code Worktrees)
Git worktrees share the .git object store but have independent indexes and HEADs.
Constraint: two worktrees cannot check out the same branch simultaneously.
Cleanup: worktrees with no changes are auto-removed when the subagent finishes.

Pattern 3: Read-only parallel, sequential writes

Use parallel subagents for research/analysis (read-only tools), then have the parent agent apply changes sequentially based on subagent findings.

This is the safest pattern when multiple agents need to inform changes to the same files.
Subagent tools field is restricted to Read, Grep, Glob -- no Edit, Write, or Bash.

Pattern 4: File-based coordination with post-merge (Anthropic's C compiler approach)

From Anthropic's Building a C Compiler project:

16 agents ran in parallel, each in its own Docker container with a cloned repo.
File-based locking: agents wrote text files to current_tasks/ to claim work.
Git's built-in synchronization forced conflict resolution: agents pull upstream, merge, then push.
"Merge conflicts are frequent, but Claude is smart enough to figure that out."
This approach explicitly accepts and resolves conflicts rather than preventing them.
Critical limitation: when all agents hit the same bug (e.g., compiling Linux kernel), parallelization collapses because every agent overwrites the same fix.

Pattern 5: Agent Teams with shared task list

For the most complex coordination needs:

Each teammate gets its own context window and (optionally) its own worktree.
Shared task list with file-based locking prevents duplicate claims.
Teammates can message each other directly.
Sequential task claiming prevents race conditions on the task list itself.

Finding 6: What the operator should do

Given these findings, for the operator agent's parallel execution design:

Scenario	Recommended approach
Parallel research/analysis (no writes)	Subagents with read-only tools, no isolation needed
Parallel code generation to disjoint files	Subagents with file partitioning, sequential commits after completion
Parallel code generation with possible overlap	`isolation: worktree` on each subagent, merge after completion
Complex multi-agent coordination	Agent Teams (experimental)
Single-file changes from multiple perspectives	Sequential subagents, not parallel

The key insight: parallelism is safe for reads, dangerous for writes, and requires explicit isolation for overlapping writes. Claude Code provides no implicit safety net -- the orchestrator (operator) must enforce the boundaries.

Open questions

How does worktree merge actually work when the parent agent receives results? Is it automatic or does the parent need to git merge manually?
What is the performance cost of worktree creation/teardown for short-lived subagents?
Can the operator detect file overlap before dispatching and dynamically choose isolation vs. partitioning?

Sources

Raw

subtask-4.md

What Supervisor/Orchestration Patterns Exist for Multi-Agent Coordination?

Research date: 2026-04-12 Status: Complete

Overview

Multi-agent orchestration is the central design problem in agentic AI. Anthropic's own analysis of 200+ enterprise deployments found that 57% of project failures originated in orchestration design — agents were individually capable but poorly coordinated. This document maps the landscape of supervisor/coordination patterns across Claude Code's native primitives, Anthropic's published guidance, competing frameworks, and the broader industry, with specific attention to how these patterns inform the operator agent's parallel execution design.

1. Claude Code Native Orchestration Primitives

Claude Code offers three distinct mechanisms for multi-agent coordination, each at a different abstraction level.

1a. Subagents (Agent Tool)

The simplest delegation primitive. A parent agent spawns a child via the Agent tool; the child runs in its own context window and returns a summary to the parent.

Architecture:

Parent fires Agent tool call with a prompt, optional model/tools/isolation config
Child gets its own context window with a custom system prompt
Child works independently, returns results as a string summary
Parent never sees the child's internal reasoning, only the compressed output
Children cannot spawn their own subagents (no nesting)

Parallel execution: Multiple Agent tool calls in a single message fire concurrently. This is the key primitive — Claude can invoke 3-5 subagents simultaneously if the tool calls appear in the same response.

Configuration surface (frontmatter fields):

tools / disallowedTools — restrict capabilities
model — route to cheaper/faster models (Haiku for exploration, Sonnet for implementation)
isolation: worktree — git worktree isolation for filesystem safety
maxTurns — bound execution length
permissionMode — control approval behavior
background: true — run without blocking the parent
skills — inject domain knowledge

Fan-in semantics: Results return as strings to the parent's context. The parent must synthesize/aggregate. No structured merge — it's free-form text compression.

Key limitation: No inter-child communication. All coordination must flow through the parent. This creates a bottleneck when workers need to share intermediate discoveries.

1b. Agent Teams (Experimental)

A higher-level coordination primitive shipped February 2026. One session acts as team lead; teammates are independent Claude Code instances that communicate peer-to-peer.

Key differences from subagents:

Dimension	Subagents	Agent Teams
Communication	Report to parent only	Peer-to-peer messaging + shared task list
Coordination	Parent manages all	Self-coordinating via task claims
Context	Own window; results compressed back	Own window; fully independent
Lifecycle	Ephemeral per invocation	Persistent for team duration
Cost	Lower (compressed returns)	Higher (each is a full Claude instance)
Best for	Focused tasks where only result matters	Complex work requiring discussion

Coordination primitives:

Shared task list with pending/in-progress/completed states and dependency tracking
File-locking for task claims (prevents race conditions)
Mailbox system for direct and broadcast messaging
Plan approval gates — teammates can be required to plan before implementing
Hooks — TeammateIdle, TaskCreated, TaskCompleted for programmatic quality gates
Auto-unblocking — completing a dependency auto-unblocks downstream tasks

Current limitations:

Experimental (env var opt-in)
No session resumption with in-process teammates
No nested teams
One team per session
Lead cannot be transferred
Task status can lag (teammates sometimes fail to mark completion)

1c. Manual Parallel Sessions (Git Worktrees)

The lowest-level primitive: multiple independent Claude Code sessions running in separate git worktrees. No automated coordination — the human orchestrates.

When this is appropriate: Maximum isolation, maximum human control, no inter-agent coordination needed.

Comparison Matrix

Pattern	Parallelism	Communication	Coordination	Token Cost	Human Effort
Subagents	Yes (same message)	Parent only	Parent manages	Low-Medium	Low
Agent Teams	Yes (persistent)	Peer-to-peer	Self-coordinating	High	Medium
Manual Worktrees	Yes (manual)	None	Human	Variable	High

2. Anthropic's Published Orchestration Patterns

2a. "Building Effective Agents" — The Canonical Patterns

Anthropic's foundational guide identifies five composable patterns, plus a meta-recommendation about simplicity.

Orchestrator-Workers Pattern: A central LLM dynamically breaks down tasks, delegates to worker LLMs, and synthesizes results. The key distinction from parallelization is flexibility — subtasks aren't pre-defined but determined by the orchestrator based on the specific input.

Best for: coding changes across multiple files, research tasks with uncertain scope
The orchestrator decides what to parallelize at runtime
Workers report back; orchestrator synthesizes

Parallelization Pattern (two variations):

Sectioning — break work into independent parallel subtasks (e.g., guardrails checking + query processing simultaneously)
Voting — run identical tasks multiple times for diversity/confidence (e.g., code vulnerability reviews from multiple prompts)

Routing Pattern: Classify inputs and route to specialized handlers. Prevents optimization for one category from degrading others. Can route to different models (Haiku for simple, Sonnet for complex).

Meta-recommendation: "Start with single LLM calls. Add complexity only when measurable performance gains justify the cost increase." Multi-agent adds latency and cost — validate the tradeoff.

2b. Multi-Agent Research System

Anthropic's internal research system uses an orchestrator-worker pattern with these specifics:

Architecture:

Lead agent analyzes queries, develops strategies, spawns 3-5 subagents in parallel
Subagents use 3+ tools in parallel (reducing research time by up to 90% for complex queries)
A dedicated CitationAgent processes documents for source attribution

Task decomposition rules:

Simple queries: 1 agent, 3-10 tool calls
Comparisons: 2-4 subagents, 10-15 calls each
Complex research: 10+ subagents with clearly divided responsibilities

Critical finding: Lead agents currently execute subagents synchronously — waiting for each batch to complete before proceeding. This simplifies coordination but creates bottlenecks.

Performance: Opus 4 leading + Sonnet 4 subagents outperformed single-agent Opus 4 by 90.2% on internal research evaluation.

Cost reality: Multi-agent systems use ~15x more tokens than chat interactions. Token usage alone explains 80% of performance variance.

2c. Harness Design for Long-Running Apps

Anthropic's three-agent harness (Planner -> Generator -> Evaluator) reveals key orchestration principles:

Architecture:

Planner: Transforms brief prompts into comprehensive specs (10-16 features)
Generator: Implements features incrementally with version control
Evaluator: QA via Playwright MCP, testing like a user would

Key findings:

20x cost premium: Full harness costs ~$200 vs $9 for solo agent — but produces dramatically better output
Self-evaluation fails: "Out of the box, Claude is a poor QA agent." Agents confidently praise their own mediocre work. Separating generator from evaluator is essential.
GAN-inspired pattern: Generator-evaluator creates genuine feedback loops. The evaluator required careful calibration toward appropriate skepticism.
File-based handoffs: Agents communicate through structured files, not conversation. One writes; the next reads.
Sprint contracts: Generator and evaluator negotiate explicit deliverables and success criteria before implementation begins.

Evolution insight: "Every component in a harness encodes an assumption about what the model can't do on its own, and those assumptions are worth stress testing." Better models don't eliminate scaffolding — they shift where complexity lives.

2d. Managed Agents — Brain/Hands/Session

The newest Anthropic architecture (2026) virtualizes the agent into three decoupled components:

Brain (Harness): Stateless reasoning service. execute(name, input) -> string interface.
Hands (Execution): Sandboxes, containers, tools — all interchangeable through standardized interfaces.
Session (State): Append-only event log. Lives outside the harness. Enables rewind, failover, parallel reasoning instances.

Multi-agent implications:

Multiple stateless brains can connect to the same session via wake(sessionId)
Brains can "pass hands to one another" through tool interfaces
Agents coordinate through the shared session event log (implicit message bus)
p50 TTFT dropped ~60%, p95 dropped >90% from decoupling brain from hands

This architecture is the meta-framework: unopinionated about specific harness implementations but strict about stable interfaces.

3. External Framework Patterns

3a. LangGraph — Graph-Based State Machine Orchestration

LangGraph treats agents as nodes in a directed graph with typed state flowing through edges.

Supervisor pattern:

Central supervisor agent coordinates multiple specialized agents
Controls all communication flow and task delegation
Decides which agent to invoke based on current context
Maintains shared, persistent state across the workflow
Built-in checkpointing with time travel (replay/rewind)

Fan-out/fan-in:

Send API for dynamic fan-out to multiple nodes
State merging with reducer functions for fan-in
Async nodes for parallel execution
Claimed 60% latency reduction using parallel patterns

State management: Explicit, reducer-driven state schemas using TypedDict and Annotated types. Reducer functions prevent data loss during concurrent updates. This is the key differentiator — structured state over free-form text.

Production characteristics: Maximum control, compliance-ready, production-grade state management. Higher learning curve.

3b. CrewAI — Role-Based Crew Orchestration

CrewAI uses a role metaphor: agents have roles, backstories, and goals. Crews execute tasks through processes.

Process types:

sequential — tasks execute in order
hierarchical — manager agent delegates to workers
consensual — agents negotiate (experimental)

Characteristics: Rapid prototyping, intuitive abstraction. But: consumed nearly 2x tokens and 3x+ time in benchmarks due to multi-step verification overhead.

3c. OpenAI Agents SDK — Handoffs and Agents-as-Tools

Successor to Swarm (March 2025). Two core orchestration primitives:

Agents as Tools (Manager Pattern):

Manager agent maintains control, invokes specialists via Agent.as_tool()
Specialist helps with bounded subtask but doesn't take over conversation
Manager combines outputs, enforces unified guardrails

Handoffs (Delegation Pattern):

Triage agent routes to specialist who becomes the active agent
Specialist owns the next part of the interaction
Context history collapsed into single message during handoff (v0.6.0 breaking change)

Hybrid: A triage agent hands off to a specialist, and that specialist can still call other agents as tools for narrow subtasks.

Swarm vs. Supervisor:

Supervisor: Central coordinator routes all tasks based on runtime state
Swarm: Each agent encapsulates its own routing logic, decides independently when to transfer control
Swarm distributes routing intelligence; supervisor concentrates it

3d. Google ADK — Workflow Agent Primitives

Google's Agent Development Kit (April 2025) provides explicit workflow agent types:

SequentialAgent: Executes sub-agents in predefined order (Parser -> Extractor -> Summarizer)
ParallelAgent: Runs sub-agents simultaneously (Security Auditor + Style Enforcer + Performance Analyst, then Synthesizer merges)
LoopAgent: Iterative execution with exit signals (escalate=True for early completion, max_iterations as hard limit)

These are the most explicit fan-out/fan-in primitives in any framework — named types rather than patterns you compose.

3e. XState/Stately Agent — Deterministic Core, Agentic Shell

The pattern from David Fetterman (February 2026) applies Gary Bernhardt's "Functional Core, Imperative Shell" to agents:

Deterministic Core: XState machines handle workflow logic, state transitions, business rules
Agentic Shell: LLM agents manage conversation flow and natural language interpretation
Tools as membrane: Translate agent intent to machine events

Key insight: The machine enforces rules while the agent handles creative work. Two critical tools: get_current_state (query machine) and take_action (send typed events). Dynamic tool swapping based on machine state — agent capabilities change as the workflow progresses.

Production architecture:

Non-deterministic (thin) -> Deterministic (thick) -> Storage
Agent interprets        -> Machine validates    -> Postgres persists

4. Pattern Taxonomy: State Machine vs. LLM-Driven Orchestration

The fundamental axis in orchestration design is who decides what happens next.

State Machine Driven

The orchestrator is a state machine executing predetermined steps. LLM is called only at bounded decision points.

Characteristics:

Explicit state transitions, fully auditable
Deterministic routing (same state + event = same transition)
Testable without LLM involvement
Failure modes are enumerable
Cannot adapt to unforeseen situations

Best for: Compliance-critical workflows, multi-step sequences requiring audit trails, known-shape processes.

LLM Driven

The agent decides when to call what tool, plans ahead, reflects on results, backtracks or revises strategy.

Characteristics:

Handles unstructured problems with uncertain paths
Can adapt to novel situations
Non-deterministic — 1% failure compounds (10-step process = ~90.4% success)
Harder to audit and debug
More expensive (reasoning overhead)

Best for: Open-ended tasks, creative work, research with unknown scope.

Hybrid (The Production Pattern)

The dominant 2025-2026 pattern: rigid orchestrator containing intelligent routing agents.

State machine handles workflow shape (what phases exist, what order)
LLM handles decisions within phases (how to implement, what to search for)
Tools bridge the two (machine validates agent intent)

This is exactly what the operator already does. The operator is a stateless loop reading XState machines via CLI. Each state is a skill name. The LLM decides how to execute the skill, but the machine decides what skill runs next. Circuit breaker is a machine guard, not LLM judgment (except for cross-task trips).

5. Fan-Out / Fan-In Patterns Across Systems

The "fan-out work, fan-in results" pattern appears in every framework but with different semantics:

System	Fan-Out Mechanism	Fan-In Mechanism	State During Parallel
Claude Code Subagents	Multiple Agent tool calls in same message	Parent reads all return strings, synthesizes	No shared state; parent waits
Claude Code Agent Teams	Team lead spawns teammates + shared task list	Lead synthesizes; teammates can message each other	Shared task list + mailbox
LangGraph	`Send API` to multiple nodes	Reducer functions merge typed state	Explicit state schema with reducers
OpenAI Agents SDK	`asyncio.gather()` for parallel agent calls	Manager combines outputs	Conversation history handoff
Google ADK	`ParallelAgent` wrapper	Synthesizer agent post-parallel	ADK session state
Anthropic Harness	Sequential (Planner -> Generator -> Evaluator)	File-based handoffs	Files on disk
Anthropic Managed Agents	Multiple stateless brains on same session	Shared event log	Append-only session events

Key finding: The synthesis step is where most fan-in systems fail. An aggregator told to "combine everything" without quality criteria, priority rankings, or conflict resolution rules produces bloated or arbitrary output. Effective fan-in requires explicit merge policies.

6. The Operator's Current Pattern and Extension Points

The existing operator (agents/operator.md) implements a state-machine-driven sequential orchestrator:

Read State -> Pick Next Task -> Invoke Skill (Agent tool) -> Read Result -> Advance Machine -> Loop

What it already does well:

Clean separation: XState machines own state, operator owns coordination, skills own execution
Stateless loop — all state in filesystem, resumable by design
Circuit breaker pattern with both automatic (retry exhaustion) and judgment (blocked/WIP) trips
File-based handoffs (progress.md, operator.md) for inter-agent communication
GAN-inspired generator-evaluator separation (do-work -> qa)

Where parallelism could enter:

Independent tasks in parallel: When tasks have no dependencies, fire multiple Agent tool calls in one message. This uses the subagent primitive directly. The state ledger would need to support concurrent transitions.
Parallel QA: Run QA on multiple completed tasks simultaneously. Each QA agent gets its own worktree isolation.
Agent Teams for complex features: For features with many interrelated tasks, the operator could spawn a team instead of sequential skill invocations. The team lead would be the operator itself, with teammates owning individual tasks.
Hybrid: parallel fan-out for independent tasks, sequential for dependent chains. Parse the dependency graph from tasks.md, identify independent sets, fan-out each set, fan-in results, advance all machines, repeat.

7. Key Recommendations

For the operator's parallel execution design:

Start with subagent fan-out for independent tasks. This is the lowest-risk addition: fire multiple Agent tool calls when the dependency graph says tasks are independent. No new primitives needed — just analyze the task DAG before picking the next task.
Use worktree isolation for parallel code generation. Each parallel subagent should get isolation: worktree to prevent filesystem conflicts. Merge happens after all return.
Don't use Agent Teams for the operator — yet. Agent Teams adds coordination overhead and experimental instability. The operator's value is deterministic execution. Agent Teams is better for exploratory/research phases where inter-agent discussion adds value.
Keep the state machine as the authority. The hybrid pattern (deterministic core, agentic shell) is exactly right. The machine decides task ordering and transitions; the LLM decides how to execute each task. Parallelism is a scheduling optimization, not an architectural change.
Design explicit fan-in policies. When parallel tasks return, the operator needs merge rules: are results independent (just advance each machine separately)? Do they need synthesis? What happens if one fails and others succeed?
Budget for 15-20x token cost. Anthropic's own data: multi-agent = ~15x chat tokens. Parallel fan-out multiplies this by the parallelism factor. Set per-agent token limits and overall budget caps.

Sources

Raw

subtask-5-token-cost-context.md

topic	What are the token cost and context implications of parallel vs. sequential execution?
parent	[[claude-code-parallel-primitives/research]]
date	2026-04-12
type	research-subtask
status	complete

Token Cost and Context Implications of Parallel vs. Sequential Execution

Parallel execution in [[Claude Code]] fundamentally changes the cost equation compared to sequential execution. The core tension: parallel agents trade token efficiency for wall-clock speed. Each parallel agent operates as a separate Claude API session with its own [[context window]], so costs scale with concurrency rather than being amortized into a single growing context.

Cost Model: Multiplicative, Not Additive

Parallel execution produces roughly N-times token consumption for N agents, plus coordination overhead. The scaling is not purely multiplicative because different primitives have different overhead profiles:

[[Subagents]] (Agent tool): Each subagent runs in its own fresh context window. The parent agent's conversation history does not carry over -- only the Agent tool's prompt string provides context. The subagent's final message returns verbatim as the tool result. Token cost = (system prompt + CLAUDE.md + tool definitions + spawn prompt + subagent's own tool calls and reasoning) per subagent. For N parallel subagents, total input tokens are roughly N times the per-subagent cost, not N times the parent's accumulated context.

[[Agent Teams]]: Each teammate is a fully independent [[Claude Code]] instance with its own 1M-token context window. Official documentation states agent teams use approximately 7x more tokens than standard sessions when teammates run in plan mode, though this number varies with team size. A 4-agent team uses 3-4x tokens of a single session; a 5-agent team can reach 5-7x. The multiplier exceeds the raw agent count because of coordination overhead: mailbox messages, task list polling, and broadcast communications all consume additional tokens.

Sequential execution has a different cost profile: a single growing context where each subsequent API call re-sends the entire conversation history. The Nth sequential call costs proportionally more because the context has grown. However, [[prompt caching]] dramatically reduces the effective cost of re-sent content (cache reads cost 0.1x base input price).

Prompt Cache Behavior: Parallel Agents Do Not Share Cache

This is one of the most important findings for cost optimization. [[Prompt caching]] in Claude has several properties that interact poorly with parallel execution:

Cache isolation: Caches are isolated per workspace (as of February 2026). Different organizations never share caches. Within a workspace, cache hits require 100% identical prompt prefixes -- if any byte changes, the cache misses entirely.
Parallel cold-start problem: A cache entry only becomes available after the first response begins streaming. For concurrent requests, Anthropic's documentation explicitly recommends waiting for the first response before sending subsequent requests to ensure cache hits. Parallel subagents firing simultaneously will each cold-start their own cache writes rather than benefiting from a shared cache.
Per-model cache separation: Caches are per-model. If the parent runs on [[Opus]] and subagents run on [[Sonnet]] or [[Haiku]], each model builds its own cache from scratch. Switching a subagent to Haiku for cost savings means Haiku cannot reuse Opus's cached context.
What does get cached across parallel agents: Tool definitions and system prompts that are identical across subagents will share cache -- but only if the requests are serialized (first response completes before second request fires). In practice, truly parallel subagents each pay the 1.25x cache write cost independently.

Quantitative impact: Cache reads cost $0.50/MTok on Opus 4.6 vs. $5/MTok for uncached input (10x cheaper). Cache writes cost $6.25/MTok (1.25x base). For a parallel fan-out of 5 subagents each processing 100K tokens of shared system prompt, sequential execution would pay 1 cache write + 4 cache reads ($0.825 total), while parallel execution pays 5 independent cache writes ($3.125 total) -- roughly 3.8x more expensive on that shared prefix alone.

Context Window Pressure and Subagent Result Compression

When subagent results return to the parent, context pressure depends on the execution primitive:

Subagents (Agent tool): The parent receives the subagent's final message verbatim as the Agent tool result. All intermediate tool calls, file reads, and reasoning stay in the subagent's context and never enter the parent. This is the primary context-saving mechanism -- a subagent might read 15 files (150K tokens internally) but return only a 2K-token summary. The parent's context grows only by the size of the returned result, not the subagent's working context.

Agent Teams: Results flow through the mailbox messaging system, not as direct context injection. Each teammate is fully independent -- the parent (team lead) receives messages from teammates but does not inherit their full context. The trade-off is that Agent Teams cannot return results as compactly as subagents because the communication is message-based rather than tool-result-based.

Compaction: [[Auto-compaction]] triggers when context usage hits approximately 83.5% of the window (the buffer is ~33K tokens on a 200K window). Compaction produces a summary typically 60-80% shorter than the original. In multi-agent scenarios, compaction happens independently per agent -- each agent compacts its own context without awareness of other agents' state. The API-level compaction (beta: compact-2026-01-12) allows configuring context_token_threshold with a default of 150K tokens and minimum of 50K.

The 20x Cost Finding: Does Parallelism Multiply It?

Anthropic's [[harness design]] article documented a dramatic cost differential: a solo agent run cost $9 over 20 minutes, while the full three-agent harness (planner + generator + evaluator) cost $200 over 6 hours -- roughly 22x more expensive. Updated results with [[Opus 4.6]] brought this down to $124.70 over 3 hours 50 minutes, with the breakdown:

Agent Phase	Cost	Time
Planner	$0.46	4.7 min
Build Round 1	$71.08	2h 7m
QA Round 1	$3.24	8.8m
Build Round 2	$36.89	1h 2m
QA Round 2	$3.09	6.8m
Build Round 3	$5.88	10.9m
QA Round 3	$4.06	9.6m

The 20x cost is not primarily from parallelism -- the harness runs agents sequentially (build then QA, iteratively). The cost comes from longer autonomous sessions with more tool calls, more code generation, and more evaluation cycles. The harness trades cost for quality: the evaluator catches concrete bugs (routing errors, untriggered functions) that a solo agent would miss.

Does adding parallelism to the harness pattern multiply the 20x? Not linearly. If you parallelized the build phase across 3 agents working on different features simultaneously, you would roughly triple the build-phase cost but potentially cut wall-clock time by 2-3x. The evaluator phase would remain sequential (it needs to see the combined output). So a parallelized harness might cost 1.5-2x more than the sequential harness, not 3x, because the QA and planning phases stay constant. The key insight from the article: with Opus 4.6, context resets between rounds were eliminated entirely -- "the agents were run as one continuous session across the whole build, with the Claude Agent SDK's automatic compaction handling context growth along the way."

Practical Cost Optimization Strategies

Model tiering is the highest-leverage optimization for multi-agent workflows:

Role	Recommended Model	Cost Ratio vs. Opus
Orchestrator / complex reasoning	Opus 4.6	1.0x
Code generation / coordination	Sonnet 4.6	0.6x input, 0.6x output
Exploration / search / simple tasks	Haiku 4.5	0.2x input, 0.2x output

The built-in Explore subagent already uses [[Haiku]] for read-only codebase search. Using Haiku for exploration subagents and Sonnet for implementation typically reduces costs 40-50% compared to using Sonnet for everything.

Context isolation is the second-highest leverage. Subagents that perform verbose operations (running tests, reading logs, fetching docs) should handle that content in their own context and return only summaries. The official docs explicitly recommend this pattern: "delegate verbose operations to subagents so the verbose output stays in the subagent's context while only a summary returns to your main conversation."

Specific strategies:

Keep spawn prompts minimal: Everything in the spawn prompt adds to each subagent's context from the start. Include only file paths, error messages, and decisions the subagent needs.
Cap extended thinking: Set MAX_THINKING_TOKENS=10000 for routine subagent tasks. Thinking tokens are billed as output tokens ($25/MTok on Opus), so an uncapped thinking budget on N parallel agents multiplies this cost N-fold.
Prefer subagents over agent teams for focused tasks. Subagents return results directly to the parent with lower overhead. Agent teams add mailbox, task list, and coordination protocol overhead -- use them only when agents need to communicate with each other.
Serialize cache-dependent requests when possible. If multiple subagents share a large system prompt, staggering their launch (waiting for the first to begin streaming) allows subsequent agents to hit the cache.
Use /compact while cache is warm: Deploy compaction within 5 minutes of the last message while the prompt cache is still valid, rather than after the cache expires.
Move stable instructions to skills: Skills load on-demand, while CLAUDE.md loads at session start for every agent. Specialized instructions in skills avoid inflating every subagent's base context.
Clean up agent teams promptly: Active teammates continue consuming tokens even when idle, as they still process mailbox polls and task list checks.

Cost Comparison Summary

Scenario	Relative Token Cost	Wall-Clock Speed	Cache Efficiency
Single agent, sequential	1.0x (baseline)	Slowest	Best (full cache reuse)
N subagents, parallel	~Nx + subagent overhead	Fastest	Poor (parallel cold-starts)
N subagents, staggered	~Nx but better cache	Moderate	Good (serial cache hits)
Agent team (3 members)	3-7x	Fast	Independent per agent
Agent team (5 members)	5-15x	Fastest	Independent per agent
Harness (sequential multi-agent)	10-22x	Slow but thorough	Moderate (context resets)

The critical insight: parallelism primarily trades dollars for time, and the cost-time tradeoff is rarely linear because coordination overhead, cache misses, and independent context windows all add friction. The most cost-effective parallel pattern is focused subagents with model tiering -- Haiku for exploration, Sonnet for implementation, Opus only for orchestration.

Sources

Manage costs effectively - Claude Code Docs -- Official cost management guide with agent team token costs, rate limit recommendations, and context management strategies
Harness design for long-running application development - Anthropic Engineering -- Source of the 20x cost finding; detailed breakdown of multi-agent harness costs with Opus 4.5 and 4.6
Prompt caching - Claude API Docs -- Complete prompt caching documentation including pricing multipliers, parallel request behavior, cache invalidation rules, and TTL semantics
Create custom subagents - Claude Code Docs -- Subagent architecture, context isolation mechanics, built-in Explore/Plan agents, model selection for subagents
Subagents in the SDK - Claude Code Docs -- SDK-level subagent documentation covering context isolation, parallelization, what subagents inherit vs. what they don't
Orchestrate teams of Claude Code sessions - Claude Code Docs -- Agent Teams architecture, token usage scaling, communication overhead, comparison with subagents
Pricing - Claude API Docs -- Complete model pricing for Opus 4.6, Sonnet 4.6, Haiku 4.5 including cache write/read rates and batch discounts
Compaction - Claude API Docs -- API-level compaction controls, trigger thresholds, integration with prompt caching, token counting behavior
Claude Code Sub Agents - Burn Out Your Tokens - DEV Community -- Real-world token consumption data showing parallel subagents exhausting Pro plan quota in 15 minutes vs. 30 minutes sequential
Claude Code Cost Optimisation Guide - systemprompt.io -- Quantified optimization strategies: model selection saves ~40%, MAX_THINKING_TOKENS is the single biggest lever, typical daily costs $5-15 vs. $20-40 without optimization