This document describes an engineering maturity model for AI-assisted development, from manual coding to parallel autonomous execution across worktrees/branches.
It is intended to help us:
- name the current operating mode clearly
- understand the risks and constraints of each mode
- define the next capability to add (instead of just "using more AI")
- keep verification, review, and deployment discipline as autonomy increases
- Manual Coding
- Manual AI Assistance (IDE Integration)
- Supervised AI Coding (CLI / RALPH)
- Unsupervised AI Coding (RALPH AFK / scheduled loops)
- Parallel Unsupervised AI Coding (worktrees / one agent per branch)
- Orchestrated Multi-Agent Delivery (planner + workers + reviewer)
- Continuous Autonomous Engineering (always-on maintenance + product iteration)
flowchart LR
A["Manual Coding"] --> B["Manual AI Assistance (IDE)"]
B --> C["Supervised AI Coding (CLI / RALPH)"]
C --> D["Unsupervised AI Coding (AFK loops)"]
D --> E["Parallel Unsupervised AI Coding (Worktrees)"]
E --> F["Orchestrated Multi-Agent Delivery"]
F --> G["Continuous Autonomous Engineering"]
Human writes the code directly. Tooling is traditional: editor, tests, debugger, docs, CI.
- Implement
POST /api/places/searchmanually. - Read
PRD.md, write route, client, cache logic, UI updates. - Run
pnpm typecheck,pnpm lint,pnpm test:run,pnpm build. - Commit and push.
- highest control
- best for subtle architecture changes
- easiest to reason about ownership and intent
- slowest iteration speed
- repetitive scaffolding work consumes focus
- context switching cost remains high
flowchart TD
H["Human"] --> P["Plan"]
P --> I["Implement"]
I --> V["Validate"]
V --> C["Commit / Push"]
Human remains the driver. AI helps with local edits, refactors, snippets, and explanations inside the editor.
- Ask IDE assistant to refactor a React component to use a server component wrapper + client child.
- Ask for a type-safe mapper from Google Places response to
PlaceRecord. - Human reviews and adjusts the output before saving.
- very fast for local transformations
- low setup friction
- good for explanations and small refactors
- weak long-running execution model
- poor task memory across sessions unless documented
- easy to produce inconsistent patterns without project rules
sequenceDiagram
participant Dev as Developer
participant IDE as IDE
participant AI as AI Assistant
Dev->>AI: "Refactor this component"
AI-->>IDE: Suggested edits
Dev->>IDE: Accept / modify / reject
Dev->>IDE: Run tests locally
Human delegates one scoped task to an agent via CLI, then reviews the diff and validation before deciding to commit or iterate.
This is the current core workflow for this repo.
- Run
/.ralph/run.sh --agent codex --mode feature --no-commit - Agent selects one concrete task from
progress.txt - Agent implements + validates + updates
progress.txt - Human reviews output and reruns if needed
- strong task focus when paired with
progress.txt - reproducible prompt + logs
- good balance of speed and oversight
- quality depends heavily on task clarity
- prompt drift can cause inconsistent behavior
- single agent means serial throughput
flowchart TD
U["Human picks mode"] --> R["RALPH run (1 task)"]
R --> X["Agent edits + validates"]
X --> S["Update progress.txt"]
S --> Review["Human review"]
Review -->|good| Commit["Commit / Push"]
Review -->|needs work| Retry["Refine prompt/spec and rerun"]
Agent runs repeatedly without per-iteration human input. The loop enforces scope, validation, and progress logging.
- Run
/.ralph/codex-loop.sh 50 3 - Each iteration:
- run one RALPH task
- validate
- update
progress.txt - commit/push if successful
- Stop early if repeated no-progress iterations
- high throughput on well-specified backlogs
- good for implementation queues and cleanup tasks
- builds a documented execution trail in
progress.txt
- bad task specs waste many cycles
- runtime/integration issues can recur without human intervention
- requires strict guardrails (validation, commit scope, logs)
flowchart TD
Start["AFK loop start"] --> I1["Iteration n"]
I1 --> Run["Run RALPH (one task)"]
Run --> Val["Validate"]
Val --> Prog["Update progress.txt"]
Prog --> Check["Progress detected? (worktree or HEAD)"]
Check -->|yes| Next["Sleep and continue"]
Check -->|no, repeated| Stop["Stop early"]
Next --> I1
Multiple agents run simultaneously, each in an isolated worktree and branch, usually one task per worktree.
This is the natural next step after AFK loops.
- Worktree A:
P5density grid calculation - Worktree B: Playwright + DB seed scripts
- Worktree C: Better Auth key encryption at rest
- Each agent runs its own RALPH loop and commits to its branch
- Human or reviewer agent merges via queue after CI passes
- isolation of dependencies and diffs
- avoids task collisions in the same working tree
- makes parallel execution inspectable and reversible
- two agents modify the same files or concepts
- branch drift / merge conflicts
- duplicated effort without task leasing
- CI capacity and cost spikes
flowchart TB
Planner["Task Queue / Planner"] --> W1["Worktree A (branch: p5-density)"]
Planner --> W2["Worktree B (branch: p6-playwright)"]
Planner --> W3["Worktree C (branch: p7-encryption)"]
W1 --> CI1["CI"]
W2 --> CI2["CI"]
W3 --> CI3["CI"]
CI1 --> MQ["Merge Queue"]
CI2 --> MQ
CI3 --> MQ
MQ --> Main["main"]
Roles become specialized. One agent plans, multiple agents implement, another agent reviews, and a merge gate decides what lands.
- Planner agent:
- decomposes PRD into parallelizable tasks
- writes/refines task specs in
progress.txtor structured tasks file
- Worker agents:
- implement one task per worktree
- run validation + commit
- Reviewer agent:
- reviews diffs for regressions/security risks
- confirms task acceptance criteria
- Merge gate:
- CI required
- reviewer findings resolved
- conflict-free rebase / merge queue
- much higher throughput with bounded risk
- better separation of implementation vs review
- easier to scale across engineering surfaces (frontend, backend, infra)
- needs orchestration tooling (task leases, worktree lifecycle, merge queue)
- requires stronger structured task specs than freeform prompts
- verification quality becomes the bottleneck
flowchart LR
P["Planner Agent"] --> T["Task Specs"]
T --> A["Worker A"]
T --> B["Worker B"]
T --> C["Worker C"]
A --> R["Reviewer Agent"]
B --> R
C --> R
R --> G["Merge Gate (CI + policy)"]
G --> M["main"]
AI systems continuously maintain and improve the codebase, not only when manually invoked.
Humans remain accountable for product direction, architecture, risk decisions, and policy.
- dependency upgrades and compatibility fixes
- flaky test detection and stabilization
- security review/patch loops
- docs drift detection
- performance regression checks
- backlog grooming from production feedback
- automation runs become first-class artifacts
- agents need durable memory and task state
- observability/evals are as important as prompts
- rollback and release controls become mandatory
flowchart TD
Signals["Signals: CI, prod metrics, errors, feedback"] --> Triage["Planner / Triage Agent"]
Triage --> Queue["Task Queue"]
Queue --> Workers["Worker Agents (parallel)"]
Workers --> Review["Reviewer / Policy Agents"]
Review --> Deploy["Deploy Gate"]
Deploy --> Prod["Production"]
Prod --> Signals
These are cross-cutting capabilities that usually appear before or during the later stages.
Agents are driven by scripts and structured IO rather than ad hoc prompts alone.
- CI job generates a task payload JSON.
- Runner executes agent with a fixed prompt template and machine-readable result contract.
- Pipeline parses result and routes to review or retry.
Tasks become machine-checkable and explicit (acceptance criteria, files, validation, out-of-scope).
progress.txtortasks.jsoncontains:- goal
- scope
- validation commands
- done criteria
- Agent is only allowed to pick tasks that meet the spec format.
A reviewer agent runs even when implementation succeeded, before merge.
- Worker commits feature branch
- Reviewer agent runs "find regressions/security issues only"
- Inline findings become blocking checks
The next gains will come less from larger prompts and more from better systems.
- stronger task specs and task leasing
- parallel worktree orchestration
- dedicated reviewer/security agents
- merge queues and policy gates
- ephemeral per-agent environments (DB seed + preview URL)
- continuous maintenance automations
- define constraints, not line-by-line code
- set acceptance criteria and risk tolerance
- approve architectural changes
- review ambiguous tradeoffs and product direction
- design the verification system
mindmap
root((Future AI Engineering))
Orchestration
Planner
Task leasing
Worktrees
Merge queue
Verification
Typecheck
Tests
Build
Security review
Runtime checks
Operations
Preview envs
Seeded DBs
Rollbacks
Observability
Human Oversight
Architecture
Prioritization
Risk decisions
Product intent
- Add a worktree-aware RALPH runner (
/.ralph/worktree-loop.sh) - Introduce task leasing to prevent two agents from taking the same task
- Add a
reviewmode (diff review + findings only) - Add a merge gate workflow for multi-branch agent output
- Gradually move
progress.txttask specs to a stricter structure (can still stay human-readable)
As autonomy increases, prompt quality matters less than:
- task clarity
- isolation
- verification
- review discipline
- merge/deploy guardrails