AI Coding Evolution

This document describes an engineering maturity model for AI-assisted development, from manual coding to parallel autonomous execution across worktrees/branches.

It is intended to help us:

name the current operating mode clearly
understand the risks and constraints of each mode
define the next capability to add (instead of just "using more AI")
keep verification, review, and deployment discipline as autonomy increases

The Ladder

Manual Coding
Manual AI Assistance (IDE Integration)
Supervised AI Coding (CLI / RALPH)
Unsupervised AI Coding (RALPH AFK / scheduled loops)
Parallel Unsupervised AI Coding (worktrees / one agent per branch)
Orchestrated Multi-Agent Delivery (planner + workers + reviewer)
Continuous Autonomous Engineering (always-on maintenance + product iteration)

flowchart LR
  A["Manual Coding"] --> B["Manual AI Assistance (IDE)"]
  B --> C["Supervised AI Coding (CLI / RALPH)"]
  C --> D["Unsupervised AI Coding (AFK loops)"]
  D --> E["Parallel Unsupervised AI Coding (Worktrees)"]
  E --> F["Orchestrated Multi-Agent Delivery"]
  F --> G["Continuous Autonomous Engineering"]

1. Manual Coding

Human writes the code directly. Tooling is traditional: editor, tests, debugger, docs, CI.

Example

Implement POST /api/places/search manually.
Read PRD.md, write route, client, cache logic, UI updates.
Run pnpm typecheck, pnpm lint, pnpm test:run, pnpm build.
Commit and push.

Strengths

highest control
best for subtle architecture changes
easiest to reason about ownership and intent

Limitations

slowest iteration speed
repetitive scaffolding work consumes focus
context switching cost remains high

flowchart TD
  H["Human"] --> P["Plan"]
  P --> I["Implement"]
  I --> V["Validate"]
  V --> C["Commit / Push"]

2. Manual AI Assistance (IDE Integration)

Human remains the driver. AI helps with local edits, refactors, snippets, and explanations inside the editor.

Example

Ask IDE assistant to refactor a React component to use a server component wrapper + client child.
Ask for a type-safe mapper from Google Places response to PlaceRecord.
Human reviews and adjusts the output before saving.

Strengths

very fast for local transformations
low setup friction
good for explanations and small refactors

Limitations

weak long-running execution model
poor task memory across sessions unless documented
easy to produce inconsistent patterns without project rules

sequenceDiagram
  participant Dev as Developer
  participant IDE as IDE
  participant AI as AI Assistant
  Dev->>AI: "Refactor this component"
  AI-->>IDE: Suggested edits
  Dev->>IDE: Accept / modify / reject
  Dev->>IDE: Run tests locally

3. Supervised AI Coding (CLI / RALPH)

Human delegates one scoped task to an agent via CLI, then reviews the diff and validation before deciding to commit or iterate.

This is the current core workflow for this repo.

Example

Run /.ralph/run.sh --agent codex --mode feature --no-commit
Agent selects one concrete task from progress.txt
Agent implements + validates + updates progress.txt
Human reviews output and reruns if needed

Strengths

strong task focus when paired with progress.txt
reproducible prompt + logs
good balance of speed and oversight

Limitations

quality depends heavily on task clarity
prompt drift can cause inconsistent behavior
single agent means serial throughput

flowchart TD
  U["Human picks mode"] --> R["RALPH run (1 task)"]
  R --> X["Agent edits + validates"]
  X --> S["Update progress.txt"]
  S --> Review["Human review"]
  Review -->|good| Commit["Commit / Push"]
  Review -->|needs work| Retry["Refine prompt/spec and rerun"]

4. Unsupervised AI Coding (RALPH AFK / Loops)

Agent runs repeatedly without per-iteration human input. The loop enforces scope, validation, and progress logging.

Example

Run /.ralph/codex-loop.sh 50 3
Each iteration:
- run one RALPH task
- validate
- update progress.txt
- commit/push if successful
Stop early if repeated no-progress iterations

Strengths

high throughput on well-specified backlogs
good for implementation queues and cleanup tasks
builds a documented execution trail in progress.txt

Limitations

bad task specs waste many cycles
runtime/integration issues can recur without human intervention
requires strict guardrails (validation, commit scope, logs)

flowchart TD
  Start["AFK loop start"] --> I1["Iteration n"]
  I1 --> Run["Run RALPH (one task)"]
  Run --> Val["Validate"]
  Val --> Prog["Update progress.txt"]
  Prog --> Check["Progress detected? (worktree or HEAD)"]
  Check -->|yes| Next["Sleep and continue"]
  Check -->|no, repeated| Stop["Stop early"]
  Next --> I1

5. Parallel Unsupervised AI Coding (Worktrees / One Agent Per Branch)

Multiple agents run simultaneously, each in an isolated worktree and branch, usually one task per worktree.

This is the natural next step after AFK loops.

Example

Worktree A: P5 density grid calculation
Worktree B: Playwright + DB seed scripts
Worktree C: Better Auth key encryption at rest
Each agent runs its own RALPH loop and commits to its branch
Human or reviewer agent merges via queue after CI passes

Why worktrees matter

isolation of dependencies and diffs
avoids task collisions in the same working tree
makes parallel execution inspectable and reversible

New risks

two agents modify the same files or concepts
branch drift / merge conflicts
duplicated effort without task leasing
CI capacity and cost spikes

flowchart TB
  Planner["Task Queue / Planner"] --> W1["Worktree A (branch: p5-density)"]
  Planner --> W2["Worktree B (branch: p6-playwright)"]
  Planner --> W3["Worktree C (branch: p7-encryption)"]

  W1 --> CI1["CI"]
  W2 --> CI2["CI"]
  W3 --> CI3["CI"]

  CI1 --> MQ["Merge Queue"]
  CI2 --> MQ
  CI3 --> MQ

  MQ --> Main["main"]

6. Orchestrated Multi-Agent Delivery

Roles become specialized. One agent plans, multiple agents implement, another agent reviews, and a merge gate decides what lands.

Example role split

Planner agent:
- decomposes PRD into parallelizable tasks
- writes/refines task specs in progress.txt or structured tasks file
Worker agents:
- implement one task per worktree
- run validation + commit
Reviewer agent:
- reviews diffs for regressions/security risks
- confirms task acceptance criteria
Merge gate:
- CI required
- reviewer findings resolved
- conflict-free rebase / merge queue

Strengths

much higher throughput with bounded risk
better separation of implementation vs review
easier to scale across engineering surfaces (frontend, backend, infra)

Limitations

needs orchestration tooling (task leases, worktree lifecycle, merge queue)
requires stronger structured task specs than freeform prompts
verification quality becomes the bottleneck

flowchart LR
  P["Planner Agent"] --> T["Task Specs"]
  T --> A["Worker A"]
  T --> B["Worker B"]
  T --> C["Worker C"]
  A --> R["Reviewer Agent"]
  B --> R
  C --> R
  R --> G["Merge Gate (CI + policy)"]
  G --> M["main"]

7. Continuous Autonomous Engineering

AI systems continuously maintain and improve the codebase, not only when manually invoked.

Humans remain accountable for product direction, architecture, risk decisions, and policy.

Example streams

dependency upgrades and compatibility fixes
flaky test detection and stabilization
security review/patch loops
docs drift detection
performance regression checks
backlog grooming from production feedback

What changes operationally

automation runs become first-class artifacts
agents need durable memory and task state
observability/evals are as important as prompts
rollback and release controls become mandatory

flowchart TD
  Signals["Signals: CI, prod metrics, errors, feedback"] --> Triage["Planner / Triage Agent"]
  Triage --> Queue["Task Queue"]
  Queue --> Workers["Worker Agents (parallel)"]
  Workers --> Review["Reviewer / Policy Agents"]
  Review --> Deploy["Deploy Gate"]
  Deploy --> Prod["Production"]
  Prod --> Signals

What Is Missing Between Stages (Useful Concepts)

These are cross-cutting capabilities that usually appear before or during the later stages.

Programmatic AI Coding

Agents are driven by scripts and structured IO rather than ad hoc prompts alone.

Example

CI job generates a task payload JSON.
Runner executes agent with a fixed prompt template and machine-readable result contract.
Pipeline parses result and routes to review or retry.

Spec-Driven AI Coding

Tasks become machine-checkable and explicit (acceptance criteria, files, validation, out-of-scope).

Example

progress.txt or tasks.json contains:
- goal
- scope
- validation commands
- done criteria
Agent is only allowed to pick tasks that meet the spec format.

Review-First AI Coding

A reviewer agent runs even when implementation succeeded, before merge.

Example

Worker commits feature branch
Reviewer agent runs "find regressions/security issues only"
Inline findings become blocking checks

What the Future Looks Like (Practical Prediction)

The next gains will come less from larger prompts and more from better systems.

Likely trajectory

stronger task specs and task leasing
parallel worktree orchestration
dedicated reviewer/security agents
merge queues and policy gates
ephemeral per-agent environments (DB seed + preview URL)
continuous maintenance automations

Human role in the future

define constraints, not line-by-line code
set acceptance criteria and risk tolerance
approve architectural changes
review ambiguous tradeoffs and product direction
design the verification system

mindmap
  root((Future AI Engineering))
    Orchestration
      Planner
      Task leasing
      Worktrees
      Merge queue
    Verification
      Typecheck
      Tests
      Build
      Security review
      Runtime checks
    Operations
      Preview envs
      Seeded DBs
      Rollbacks
      Observability
    Human Oversight
      Architecture
      Prioritization
      Risk decisions
      Product intent

Recommended Next Steps for This Repo

Add a worktree-aware RALPH runner (/.ralph/worktree-loop.sh)
Introduce task leasing to prevent two agents from taking the same task
Add a review mode (diff review + findings only)
Add a merge gate workflow for multi-branch agent output
Gradually move progress.txt task specs to a stricter structure (can still stay human-readable)

Operating Principle

As autonomy increases, prompt quality matters less than:

task clarity
isolation
verification
review discipline
merge/deploy guardrails

chepetime/AI_CODING_EVOLUTION.md

AI Coding Evolution

The Ladder

1. Manual Coding

Example

Strengths

Limitations

2. Manual AI Assistance (IDE Integration)

Example

Strengths

Limitations

3. Supervised AI Coding (CLI / RALPH)

Example

Strengths

Limitations

4. Unsupervised AI Coding (RALPH AFK / Loops)

Example

Strengths

Limitations

5. Parallel Unsupervised AI Coding (Worktrees / One Agent Per Branch)

Example

Why worktrees matter

New risks

6. Orchestrated Multi-Agent Delivery

Example role split

Strengths

Limitations

7. Continuous Autonomous Engineering

Example streams

What changes operationally

What Is Missing Between Stages (Useful Concepts)

Programmatic AI Coding

Example

Spec-Driven AI Coding

Example

Review-First AI Coding

Example

What the Future Looks Like (Practical Prediction)

Likely trajectory

Human role in the future

Recommended Next Steps for This Repo

Operating Principle