A lightweight framework for managing multi-session, multi-agent work in existing codebases using Claude Code.
Born from adapting a greenfield orchestration framework (agent prompt files, shell scripts, rigid file permissions) into something practical for real-world projects — where the codebase already exists, architecture is established, and most work is features and bug fixes, not building from scratch.
Claude Code sessions are stateless. Each new session starts blind — no knowledge of what the previous session did, what decisions were made, what's left to do. Agents guess when they hit ambiguity, compounding errors across tasks. Platform limitations get rediscovered every time someone touches the same code. Tasks get marked "done" at 80%. And when you dispatch subagents without structure, every agent defaults to "assistant that does everything" — reading and writing freely, picking arbitrary models, conflicting on the same files.
This framework addresses six gaps with minimal overhead.
Location: Convention in ~/.claude/AGENTIC.md + reinforcement hook ~/.claude/hooks/orchestrator.sh
What it solves: Without a coordination convention, the main chat context becomes the implementation context — the model that should be thinking is also writing boilerplate, grepping files, and fetching docs. This crowds out the reasoning capacity you need for architecture decisions, blocker resolution, and steering mid-task.
How it works: In v3, the main chat acts as the Orchestrator (Planner) by default, regardless of which model is selected. It thinks, briefs, and delegates. It does not write code directly.
Reflex rules — default to dispatch:
- Task touches 2+ files, or complex logic in 1 file → dispatch
builder-fastorbuilder-smart. Do not Edit yourself. - Search spanning >5 files, or tracing call chains → dispatch
finder. Do not Grep yourself. - Library docs, API references, CLI behavior → dispatch
researcher. Do not WebFetch yourself. - Running tests, validating DoD, checking logs → dispatch
tester. - Multi-step work (3+ steps) → apply the
/agenticpipeline at the inferred tier automatically. No manual command needed.
Exceptions — do it yourself:
- Answering a question that needs no file edits.
- Trivial one-line fix when the file is already in context (no search, no ambiguity).
- Reading one file at a known path to show the user.
- Quick JSON/config read, small single-line script fix.
- Meta-commands (slash commands, hook edits, settings tweaks).
Tiebreaker: If uncertain between "do it" and "dispatch" → dispatch. The user chose this framework so the expensive brain stays orchestrating and cheap hands do the work.
Flow for any non-trivial request:
- Read pre-warmed context once at task start: open handoffs, active blockers, current findings,
docs/KNOWN_ISSUES.md, currenttodo.md. - Infer tier (
trivial/medium/full— see Tier Semantics below). - Write a 2–5 line brief to
.claude/mytasks/todo.mdwith Definition of Done. - Dispatch subagents in background.
- Review subagent output, compose a tight answer for the user. Raw subagent output stays in their context, not yours.
Overrides:
/agentic <task> --tier=X— explicit tier control for one task."do it yourself"— override Orchestrator mode for one turn."off orchestrator"— disable until next session (main chat resumes executing directly).
Reinforcement hook: A tiny UserPromptSubmit hook (~/.claude/hooks/orchestrator.sh) reinforces the directive on work-verb prompts only — approximately 25 tokens injected conditionally, pure bash with optional jq, approximately 30ms. It bypasses on off orchestrator, do it yourself, and any /agentic, /handoff, /blocker, /known-issue, /init-agentic invocation. The directive itself lives in AGENTIC.md § Operating Mode; the hook is reinforcement against mid-session model drift, not the source of truth.
Location: .claude/mytasks/handoffs/<task-name>.md (gitignored)
What it solves: When you close Claude Code and open a new session — or when work spans multiple days — the next session has no context about prior work. It re-reads everything, re-discovers decisions already made, or worse, makes different decisions that conflict with what was already built.
How it works: When finishing a task that will continue in another session, write a handoff file. The canonical format (the SessionStart hook scans for these):
# <task>
## Status
<what was done>
## Next
- [ ] <next step>
- [ ] <next step>
## Open questions
<list or "None">
## Files touched
- <path>When continuing multi-session work: Check .claude/mytasks/handoffs/ first. If a handoff exists for your task, read it before doing anything else.
Cleanup: Delete handoff files once the feature is complete. They're scaffolding, not documentation.
Location: .claude/mytasks/blockers.md (gitignored)
What it solves: When an agent encounters ambiguity — unclear requirements, conflicting patterns in the codebase, platform unknowns — the default behavior is to guess and keep going. This compounds: Agent A guesses wrong, Agent B builds on that guess, and you discover the problem three steps later.
How it works: When an agent hits something it cannot resolve from code, docs, or git history, it writes the blocker to .claude/mytasks/blockers.md in canonical format and asks the user. If the user resolves it, the entry is removed and work continues. If not, the agent halts that task. The file ensures blockers survive between sessions — even if the chat is closed, the context survives.
The canonical format is load-bearing. The SessionStart hook uses grep -qE '^## [0-9]{4}-' to detect active blockers. The H2 header MUST start with ## followed by a 4-digit year — mismatched heading depth produces a silent false negative:
# Active Blockers
## 2026-04-07 14:30 — Notification grouping on Linux
- Context: Implementing push notification grouping (task 3)
- Blocker: Linux notification daemons handle grouping differently.
libnotify supports `desktop-entry` hints but not all DEs respect them.
Should we implement our own grouping logic or rely on the DE?
- What I need: A decision on whether to support grouped notifications
on Linux or only macOS/Windows.
- Files involved: `src/notifications/linux.ts`Fields: Context, Blocker, What I need, Files involved. Every field must be present. If something is unknown, write <unknown>.
Resolving blockers: The agent asks you first. If you can answer immediately, great — the blocker is removed and work continues. If you need time, the blocker stays in the file. Next session, the agent reads it and knows exactly what's still unresolved.
Cleanup: Remove resolved entries from the file. If the file is empty, the agent can proceed.
Location: docs/KNOWN_ISSUES.md (committed to git)
What it solves: Platform limitations, upstream API quirks, and "this doesn't work because X" get rediscovered every time someone touches that area of the code. The knowledge exists in someone's head or buried in a closed issue — but not where the next developer (or agent) will find it.
How it works: When you discover a persistent issue that isn't a task blocker but a fact of life in the project, document it:
# Known Issues
## Electron: optional chaining in preload scripts
- **Status**: Workaround in place
- **Issue**: Electron's preload sandbox doesn't support optional chaining
in all build targets. Rollup compiles it out for modern targets but
the preload script runs in a restricted context.
- **Workaround**: Use explicit null checks in preload scripts instead of `?.`
- **Affects**: `src/preload/`, any new preload entry points
- **Ref**: electron/electron#12345Key difference from agent blockers: Known issues are permanent project knowledge. They survive across branches, sprints, and team members. Agent blockers are ephemeral — they exist only while a task is in progress. Known issues belong in the repo; blockers belong in .claude/ (gitignored).
When to write one: When you find yourself thinking "someone's going to hit this again." If the issue is tied to the platform, an upstream dependency, or a fundamental constraint rather than your current task, it belongs here.
Location: Inside .claude/mytasks/todo.md (gitignored), inline with each task
What it solves: Agents declare tasks complete at 80% because there's no explicit criteria for "done." The remaining 20% — edge cases, cross-platform verification, test coverage — gets silently dropped.
How it works: Each task in your plan includes verifiable done criteria:
## Tasks
- [ ] Implement notification grouping on macOS
**Done when**: Notifications from the same channel appear grouped in
Notification Center. Verified with 3+ rapid messages. Tests pass.
- [ ] Add Linux fallback for notification grouping
**Done when**: On GNOME and KDE, notifications degrade gracefully to
ungrouped. No crashes. `yarn test` passes. Manual verification on
Ubuntu 24.04.
- [ ] Update notification preferences UI
**Done when**: New toggle appears in Settings > Notifications. Uses
Fuselage Toggle component. Preference persists across restarts.
Screenshot taken.The rule is simple: If you can't check it, it's not done. "Implement X" is not a definition of done. "Implement X, verify Y, tests pass" is.
Location: Convention in ~/.claude/AGENTIC.md, subagent files in ~/.claude/agents/
What it solves: Without role taxonomy, every dispatched agent defaults to "assistant that does everything" — they pick arbitrary models, read and write freely, and may conflict on the same files. Naming roles with explicit capability constraints gives the Orchestrator a dispatch vocabulary and prevents agents from overstepping.
The team analogy:
Think of it as a dev team. Juniors are cheap, fast, and plentiful — hand them well-defined tasks and let them run in parallel. Mid-level engineers handle the complex code and pre-screen work before it goes up. The senior architect opens each task with a clear brief and closes it with a final review — but only after the mid-level has already screened the work, so the senior never wastes time on fixable issues.
| Tier | Model profile | Who they are | What they do |
|---|---|---|---|
| Junior | Fast model — speed and cost-optimized for high-volume parallel work. Excels at narrow, mechanical tasks with clear instructions. Don't ask it to resolve ambiguity. (e.g. Claude Haiku, GPT-4o-mini, Gemini Flash) | Fast, parallel, plentiful | Finder, Researcher, simple Builders, Tester — well-defined scoped tasks |
| Mid-level | Smart model — capable coder with solid reasoning. Handles complex implementation, understands code context, writes non-trivial logic, and can review work for correctness. Balanced cost and quality. (e.g. Claude Sonnet, GPT-4o, Gemini Pro) | Solid engineer, handles complexity | Complex Builder + Reviewer (first-pass quality gate before reasoning model) |
| Senior | Reasoning model — deep, deliberate reasoning for high-stakes decisions. Thinks through architectural trade-offs, anticipates downstream consequences, diagnoses what simpler models can't resolve. Use where the cost of a wrong decision outweighs the cost of the model. (e.g. Claude Opus, o1/o3, Gemini Ultra) | Architect, expensive, thinks before acting | Planner (writes the brief) + Auditor (escalation) + final approval |
The eight roles:
| Role | Tier | Capability | Parallelism |
|---|---|---|---|
| Planner | Reasoning model | Opens every task with an architecture brief and clear instructions per tier. Closes with final approval after Reviewer pre-screens. Never writes code directly. | Main context only |
| Auditor | Reasoning model | Dispatched after 2 failed attempts — diagnoses the root constraint, redesigns the approach, re-briefs the team. Called to think, not to code. | On demand only |
| Finder | Fast model | Codebase search — find files, trace call chains, map patterns. Read-only. | Parallel-safe |
| Researcher | Fast model | External knowledge — library docs, API refs, version-specific behavior. Read-only. | Parallel-safe |
| builder-fast | Fast model | Simple, well-defined tasks — boilerplate, renames, stubs. | Parallel where non-overlapping |
| builder-smart | Smart model | Complex implementation — core logic, algorithms, non-trivial code. | Serialized by file |
| Reviewer | Smart model | First-pass quality gate — catches issues, patches small problems, ensures work is solid before Planner sees it. | After Builders; parallel-safe |
| Tester | Fast model | Confirms correctness — runs tests, checks logs, validates done criteria. | After Builders; parallel-safe |
Rule of thumb for Finder vs Researcher: "Where is X in the code?" → Finder. "How does this library work?" → Researcher.
Planner tool frontmatter: The Planner's static tools: frontmatter excludes Agent, TaskCreate, TaskUpdate, TaskList — these are deferred in some harness configurations and would trigger InputValidationError on first dispatch. The Planner self-loads them via ToolSearch on first invocation, wired in a ## Tool preload section of planner.md.
Pipeline:
Planner [reasoning model]
│ writes the brief: architecture decisions, constraints,
│ clear instructions for each tier
│
├── Finders + Researchers [fast model, parallel]
│ map codebase, fetch docs → write findings.md
│
├── Builders [fast model, well-defined tasks, parallel where non-overlapping]
│ boilerplate, simple edits, test stubs, renames
│
├── Builders [smart model, complex tasks, serialized by file]
│ core logic, algorithms, non-trivial implementation
│
├── Reviewer [smart model, first-pass quality gate]
│ catches issues, patches small problems
│ only escalates what's genuinely ready
│
└── Planner [reasoning model, final approval]
sees only pre-screened, Reviewer-approved work
checks architecture fit, consistency, correctness
approves before merge — marks task done
If a problem survives 2 failed attempts at any stage, dispatch Auditor (reasoning model) to diagnose and re-brief before continuing.
Why serializing Builders matters: Two agents editing the same file simultaneously produce conflicts. Finders, Researchers, and Testers are safe to parallelize because they're read-only. Builders should be scoped to non-overlapping files — if two Builders must touch the same file, run them sequentially.
findings.md — intra-session communication:
When Finders or Researchers discover something other agents need to know before acting, they write it here. Builders read it before starting. This is ephemeral — delete it when the session closes.
## 2026-04-10: Auth middleware in flux
- The old `src/middleware/auth.ts` is being replaced by `src/auth/` (in-progress branch)
- New endpoints should use `src/auth/session.ts`, not the old middleware
- Affects: any task touching auth, sessions, or protected routesDifference from blockers: Findings are things agents discovered — they inform without blocking. Blockers are things agents couldn't resolve — they require user input before work continues.
/agentic <task> [--tier=trivial|medium|full] is the front door for non-trivial work. The tier flag is the cost/speed lever — it controls pipeline depth and therefore which model tiers get involved.
| Tier | Pipeline | When to use |
|---|---|---|
trivial |
Planner brief → one builder-fast → done. Skips Finders, Researchers, Reviewer, Tester. |
Rename, typo, config tweak, single-line fix, flag addition, doc edit |
medium (default) |
Planner → Finders/Researchers (parallel) → Builders. Skips Reviewer + Tester. | Small feature with 3–6 steps, scoped refactor, bug fix with tests |
full |
Full pipeline — Finders/Researchers → Builders → Reviewer → Tester → Planner approves. | Cross-cutting change, schema/migration, security-adjacent, high-stakes refactor |
Rules:
- Never default to
full— the user opts in for genuinely risky work. trivialmust NOT fall through tomediumas a safety net; skipping Reviewer is the point.- Ambiguous task (2+ interpretations) → dispatch Planner at the inferred tier, but instruct it to ask a clarifying question BEFORE dispatching subordinates.
- Under Orchestrator mode, the main chat infers the tier automatically. Use
/agentic <task> --tier=Xonly when you want to pin a specific tier or force-dispatch when the reflex rules would skip.
Blockers and handoffs use fixed formats so the SessionStart hook and slash commands can parse them reliably. Deviating from the format produces silent failures — the hook won't detect active blockers, so you won't get the warning at session start.
# Active Blockers
## YYYY-MM-DD HH:MM — <summary>
- Context: <current task + what you were doing>
- Blocker: <what you cannot resolve from code, docs, or git history>
- What I need: <the decision you need from the user>
- Files involved: <path list>The H2 header MUST start with ## followed by a 4-digit year. The SessionStart hook checks grep -qE '^## [0-9]{4}-' — if the heading depth is wrong (e.g. ### instead of ##), the hook silently misses it.
# <task>
## Status
<what was done>
## Next
- [ ] <next step>
## Open questions
<list or "None">
## Files touched
- <path>Flat append log — no structural requirements beyond readability. Ephemeral: delete on session close, never carry over to the next session.
| Command | Purpose |
|---|---|
/agentic <task> [--tier=trivial|medium|full] |
Explicit one-shot dispatch with tier control. Under Orchestrator mode the pipeline runs automatically — use this to pin a tier or force-dispatch when reflex rules would skip. |
/init-agentic |
Scaffold .claude/mytasks/ + docs/KNOWN_ISSUES.md in the current project. Run this once per project before using the framework. |
/handoff <task-name> |
Write a cross-session handoff from current session context. |
/blocker <summary> |
Append a decision blocker in canonical format and halt the current task. |
/known-issue <summary> |
Append a persistent constraint to docs/KNOWN_ISSUES.md. |
Framework paths are pre-allowed in ~/.claude/settings.json (global scope) so sessions don't prompt when agents write to them:
Write(.claude/mytasks/**)
Edit(.claude/mytasks/**)
Write(.claude/mytasks/handoffs/**)
Edit(.claude/mytasks/handoffs/**)
Write(docs/KNOWN_ISSUES.md)
Edit(docs/KNOWN_ISSUES.md)
One-time setup, covers all projects. Scope matches framework footprint — no broader write access granted.
On every session start, a hook (installed in ~/.claude/settings.json) scans the CWD for .claude/mytasks/ and prints:
📋 Open handoff: <path>— for each.mdfile inhandoffs/⚠️ Active blockers: .claude/mytasks/blockers.md— ifblockers.mdcontains canonical date-stamped entries✓ agentic: armed— when.claude/mytasks/exists but has nothing pending
Silent no-op if .claude/mytasks/ does not exist (framework not initialized for this project).
Exit-code discipline: The hook uses a found flag variable so the condition flow always exits 0, preventing Claude Code from surfacing hook output as an error rather than a system message.
project-root/
├── .claude/ ← gitignored
│ ├── settings.json ← Claude Code config (hooks wired here)
│ ├── CLAUDE.md ← project instructions
│ └── mytasks/ ← agentic working files
│ ├── todo.md ← task plan with done criteria
│ ├── blockers.md ← agent decision blockers
│ ├── findings.md ← intra-session discoveries (ephemeral)
│ └── handoffs/
│ ├── notification-grouping.md
│ └── calendar-sync-fix.md
└── docs/
└── KNOWN_ISSUES.md ← committed (permanent project knowledge)
~/.claude/ ← global Claude Code config
├── AGENTIC.md ← framework spec (loaded every session)
├── agents/
│ ├── planner.md
│ ├── auditor.md
│ ├── finder.md
│ ├── researcher.md
│ ├── builder-fast.md
│ ├── builder-smart.md
│ ├── reviewer.md
│ └── tester.md
├── commands/
│ ├── agentic.md
│ ├── init-agentic.md
│ ├── handoff.md
│ ├── blocker.md
│ └── known-issue.md
└── hooks/
└── orchestrator.sh ← UserPromptSubmit reinforcement hook
A companion file (agentic-framework-v3.md in this repo, published as a gist) is the self-installer. Paste its raw gist URL into a fresh Claude Code session and it drops all files, hooks, and commands into ~/.claude/. This doc is the explainer; that doc is the installer.
The installer:
- Writes all eight agent files under
~/.claude/agents/ - Writes all five slash command files under
~/.claude/commands/ - Writes
orchestrator.shunder~/.claude/hooks/and makes it executable - Patches
~/.claude/settings.jsonwith the SessionStart hook and UserPromptSubmit hook entries - Verifies all nine install checks and reports PASS/FAIL
The installer drops all files fresh: the AGENTIC.md spec, eight subagent definitions under ~/.claude/agents/, five slash commands under ~/.claude/commands/, and the orchestrator reinforcement hook under ~/.claude/hooks/. Hook entries are added to ~/.claude/settings.json. A @AGENTIC.md import is appended to ~/.claude/CLAUDE.md.
Per-project initialization (after the global install): run /init-agentic in any project to scaffold .claude/mytasks/ and docs/KNOWN_ISSUES.md.
The components above address gaps in multi-session and multi-agent work. They depend on foundational rules in your CLAUDE.md that govern how Claude Code plans, executes, and verifies work.
- Enter plan mode for any non-trivial task (3+ steps or architectural decisions).
- If something goes sideways, STOP and re-plan immediately — don't keep pushing.
- 2-strike rule: After 2 failed approaches to the same problem, STOP. Do not try a 3rd. Dispatch an Auditor (reasoning model) to diagnose the root constraint, then re-plan from that constraint.
- Write plan to
.claude/mytasks/todo.mdwith checkable items and done criteria. - Check in before starting implementation.
- Track progress, mark items complete, high-level summary at each step.
Plan mode is what creates .claude/mytasks/todo.md — the file where Definition of Done criteria live. The 2-strike rule complements blockers: if two approaches fail, that's a signal to write a blocker rather than try a third blind guess.
- Use subagents liberally — one task per subagent, keep main context clean.
- Assign each subagent a role (see Agent Roles above).
- Planner opens every non-trivial task with the brief and closes with final approval after Reviewer pre-screens.
- Run subagents in background — dispatched agents should run in the background so the Planner context stays free to receive steering, answer blockers, and coordinate mid-task. A blocked Planner defeats the parallel pipeline.
- Verify unknowns before dispatching — confirm APIs, commands, and library behavior via context7 or web search before writing the brief. Agents looping on nonexistent commands waste cycles and compound into blockers.
- Clarify before starting: If a request has 2+ plausible interpretations, name them and ask before writing code. Don't guess and proceed.
- Surgical changes: Touch only what the task requires. Don't improve adjacent code, comments, or formatting. Remove imports/variables/functions that YOUR changes made unused — leave pre-existing dead code alone; mention it instead.
- Platform constraints first: Check
docs/KNOWN_ISSUES.mdand research known limitations before proposing solutions. Don't trial-and-error against platform walls. - When given a bug report: just fix it. Zero context switching for the user.
- Understand WHY code is written that way — don't assume it's wrong. If unsure, ask. Working code is correct until proven otherwise.
Subagents are the agents that write and read handoffs, blockers, and findings. Without the subagent convention and role taxonomy, there's no multi-agent workflow to coordinate. The platform constraints rule is what triggers checking docs/KNOWN_ISSUES.md — connecting known issues to the moment they're most useful.
- Never mark a task complete without proving it works.
- Run tests, check logs, demonstrate correctness.
- For UI changes: use screenshots or browser automation to verify rendering.
- Ask yourself: "Would a senior engineer approve this?"
- Don't push validation work to the user.
This is the enforcement mechanism for Definition of Done. Without it, done criteria exist on paper but nothing forces the agent to actually check them before marking a task complete.
| Situation | Action |
|---|---|
| Starting any non-trivial task | Just type the task — Orchestrator auto-dispatches. Use /agentic <task> --tier=X only to pin a specific tier. |
| Closing CC, will continue tomorrow | /handoff <name> |
| Closing CC after a multi-agent session | Delete findings.md (ephemeral — don't carry over) |
| Agent hits ambiguity it cannot resolve | /blocker <summary> — write it, then halt |
| Found a platform limitation that'll bite again | /known-issue <summary> |
| Planning a non-trivial task | Add done criteria to each item |
| Request has 2+ plausible interpretations | Clarify first, don't start |
| Starting any non-trivial task | Planner (reasoning model) writes the brief first |
| Need to find files or trace code patterns | Dispatch finder (fast model, parallel) |
| Need library docs or API references | Dispatch researcher (fast model) |
| Dispatching any subagent | Run in background — keeps Planner free to coordinate |
| Uncertain about an API, command, or library | Verify with context7 or web search before writing the brief |
| Finder/Researcher discovers something agents need | Write to findings.md |
| Implementing well-defined, scoped tasks | builder-fast (parallel where non-overlapping) |
| Implementing complex logic or core code | builder-smart (serialized by file) |
| Reviewing implementation before Planner sees it | reviewer (smart model) |
| Verifying done criteria | tester (fast model) after Builders |
| Problem survived 2 failed attempts | Dispatch auditor (reasoning model) to re-diagnose and re-brief |
| Simple bug fix, single session | None of this — just fix it |
Adapted from a community-shared orchestration framework for greenfield multi-agent builds. The original framework used agent prompt files, orchestration shell scripts, and rigid file permission scoping — patterns designed for building from scratch. This version strips it down to what works in existing, mature codebases where architecture is already established and most work is features and fixes.