Skip to content

Instantly share code, notes, and snippets.

@chepetime
Created February 25, 2026 03:09
Show Gist options
  • Select an option

  • Save chepetime/6ee32d9e8a91f439d3d54ecaf4b22393 to your computer and use it in GitHub Desktop.

Select an option

Save chepetime/6ee32d9e8a91f439d3d54ecaf4b22393 to your computer and use it in GitHub Desktop.

AI Coding Evolution

This document describes an engineering maturity model for AI-assisted development, from manual coding to parallel autonomous execution across worktrees/branches.

It is intended to help us:

  • name the current operating mode clearly
  • understand the risks and constraints of each mode
  • define the next capability to add (instead of just "using more AI")
  • keep verification, review, and deployment discipline as autonomy increases

The Ladder

  1. Manual Coding
  2. Manual AI Assistance (IDE Integration)
  3. Supervised AI Coding (CLI / RALPH)
  4. Unsupervised AI Coding (RALPH AFK / scheduled loops)
  5. Parallel Unsupervised AI Coding (worktrees / one agent per branch)
  6. Orchestrated Multi-Agent Delivery (planner + workers + reviewer)
  7. Continuous Autonomous Engineering (always-on maintenance + product iteration)
flowchart LR
  A["Manual Coding"] --> B["Manual AI Assistance (IDE)"]
  B --> C["Supervised AI Coding (CLI / RALPH)"]
  C --> D["Unsupervised AI Coding (AFK loops)"]
  D --> E["Parallel Unsupervised AI Coding (Worktrees)"]
  E --> F["Orchestrated Multi-Agent Delivery"]
  F --> G["Continuous Autonomous Engineering"]
Loading

1. Manual Coding

Human writes the code directly. Tooling is traditional: editor, tests, debugger, docs, CI.

Example

  • Implement POST /api/places/search manually.
  • Read PRD.md, write route, client, cache logic, UI updates.
  • Run pnpm typecheck, pnpm lint, pnpm test:run, pnpm build.
  • Commit and push.

Strengths

  • highest control
  • best for subtle architecture changes
  • easiest to reason about ownership and intent

Limitations

  • slowest iteration speed
  • repetitive scaffolding work consumes focus
  • context switching cost remains high
flowchart TD
  H["Human"] --> P["Plan"]
  P --> I["Implement"]
  I --> V["Validate"]
  V --> C["Commit / Push"]
Loading

2. Manual AI Assistance (IDE Integration)

Human remains the driver. AI helps with local edits, refactors, snippets, and explanations inside the editor.

Example

  • Ask IDE assistant to refactor a React component to use a server component wrapper + client child.
  • Ask for a type-safe mapper from Google Places response to PlaceRecord.
  • Human reviews and adjusts the output before saving.

Strengths

  • very fast for local transformations
  • low setup friction
  • good for explanations and small refactors

Limitations

  • weak long-running execution model
  • poor task memory across sessions unless documented
  • easy to produce inconsistent patterns without project rules
sequenceDiagram
  participant Dev as Developer
  participant IDE as IDE
  participant AI as AI Assistant
  Dev->>AI: "Refactor this component"
  AI-->>IDE: Suggested edits
  Dev->>IDE: Accept / modify / reject
  Dev->>IDE: Run tests locally
Loading

3. Supervised AI Coding (CLI / RALPH)

Human delegates one scoped task to an agent via CLI, then reviews the diff and validation before deciding to commit or iterate.

This is the current core workflow for this repo.

Example

  • Run /.ralph/run.sh --agent codex --mode feature --no-commit
  • Agent selects one concrete task from progress.txt
  • Agent implements + validates + updates progress.txt
  • Human reviews output and reruns if needed

Strengths

  • strong task focus when paired with progress.txt
  • reproducible prompt + logs
  • good balance of speed and oversight

Limitations

  • quality depends heavily on task clarity
  • prompt drift can cause inconsistent behavior
  • single agent means serial throughput
flowchart TD
  U["Human picks mode"] --> R["RALPH run (1 task)"]
  R --> X["Agent edits + validates"]
  X --> S["Update progress.txt"]
  S --> Review["Human review"]
  Review -->|good| Commit["Commit / Push"]
  Review -->|needs work| Retry["Refine prompt/spec and rerun"]
Loading

4. Unsupervised AI Coding (RALPH AFK / Loops)

Agent runs repeatedly without per-iteration human input. The loop enforces scope, validation, and progress logging.

Example

  • Run /.ralph/codex-loop.sh 50 3
  • Each iteration:
    • run one RALPH task
    • validate
    • update progress.txt
    • commit/push if successful
  • Stop early if repeated no-progress iterations

Strengths

  • high throughput on well-specified backlogs
  • good for implementation queues and cleanup tasks
  • builds a documented execution trail in progress.txt

Limitations

  • bad task specs waste many cycles
  • runtime/integration issues can recur without human intervention
  • requires strict guardrails (validation, commit scope, logs)
flowchart TD
  Start["AFK loop start"] --> I1["Iteration n"]
  I1 --> Run["Run RALPH (one task)"]
  Run --> Val["Validate"]
  Val --> Prog["Update progress.txt"]
  Prog --> Check["Progress detected? (worktree or HEAD)"]
  Check -->|yes| Next["Sleep and continue"]
  Check -->|no, repeated| Stop["Stop early"]
  Next --> I1
Loading

5. Parallel Unsupervised AI Coding (Worktrees / One Agent Per Branch)

Multiple agents run simultaneously, each in an isolated worktree and branch, usually one task per worktree.

This is the natural next step after AFK loops.

Example

  • Worktree A: P5 density grid calculation
  • Worktree B: Playwright + DB seed scripts
  • Worktree C: Better Auth key encryption at rest
  • Each agent runs its own RALPH loop and commits to its branch
  • Human or reviewer agent merges via queue after CI passes

Why worktrees matter

  • isolation of dependencies and diffs
  • avoids task collisions in the same working tree
  • makes parallel execution inspectable and reversible

New risks

  • two agents modify the same files or concepts
  • branch drift / merge conflicts
  • duplicated effort without task leasing
  • CI capacity and cost spikes
flowchart TB
  Planner["Task Queue / Planner"] --> W1["Worktree A (branch: p5-density)"]
  Planner --> W2["Worktree B (branch: p6-playwright)"]
  Planner --> W3["Worktree C (branch: p7-encryption)"]

  W1 --> CI1["CI"]
  W2 --> CI2["CI"]
  W3 --> CI3["CI"]

  CI1 --> MQ["Merge Queue"]
  CI2 --> MQ
  CI3 --> MQ

  MQ --> Main["main"]
Loading

6. Orchestrated Multi-Agent Delivery

Roles become specialized. One agent plans, multiple agents implement, another agent reviews, and a merge gate decides what lands.

Example role split

  • Planner agent:
    • decomposes PRD into parallelizable tasks
    • writes/refines task specs in progress.txt or structured tasks file
  • Worker agents:
    • implement one task per worktree
    • run validation + commit
  • Reviewer agent:
    • reviews diffs for regressions/security risks
    • confirms task acceptance criteria
  • Merge gate:
    • CI required
    • reviewer findings resolved
    • conflict-free rebase / merge queue

Strengths

  • much higher throughput with bounded risk
  • better separation of implementation vs review
  • easier to scale across engineering surfaces (frontend, backend, infra)

Limitations

  • needs orchestration tooling (task leases, worktree lifecycle, merge queue)
  • requires stronger structured task specs than freeform prompts
  • verification quality becomes the bottleneck
flowchart LR
  P["Planner Agent"] --> T["Task Specs"]
  T --> A["Worker A"]
  T --> B["Worker B"]
  T --> C["Worker C"]
  A --> R["Reviewer Agent"]
  B --> R
  C --> R
  R --> G["Merge Gate (CI + policy)"]
  G --> M["main"]
Loading

7. Continuous Autonomous Engineering

AI systems continuously maintain and improve the codebase, not only when manually invoked.

Humans remain accountable for product direction, architecture, risk decisions, and policy.

Example streams

  • dependency upgrades and compatibility fixes
  • flaky test detection and stabilization
  • security review/patch loops
  • docs drift detection
  • performance regression checks
  • backlog grooming from production feedback

What changes operationally

  • automation runs become first-class artifacts
  • agents need durable memory and task state
  • observability/evals are as important as prompts
  • rollback and release controls become mandatory
flowchart TD
  Signals["Signals: CI, prod metrics, errors, feedback"] --> Triage["Planner / Triage Agent"]
  Triage --> Queue["Task Queue"]
  Queue --> Workers["Worker Agents (parallel)"]
  Workers --> Review["Reviewer / Policy Agents"]
  Review --> Deploy["Deploy Gate"]
  Deploy --> Prod["Production"]
  Prod --> Signals
Loading

What Is Missing Between Stages (Useful Concepts)

These are cross-cutting capabilities that usually appear before or during the later stages.

Programmatic AI Coding

Agents are driven by scripts and structured IO rather than ad hoc prompts alone.

Example

  • CI job generates a task payload JSON.
  • Runner executes agent with a fixed prompt template and machine-readable result contract.
  • Pipeline parses result and routes to review or retry.

Spec-Driven AI Coding

Tasks become machine-checkable and explicit (acceptance criteria, files, validation, out-of-scope).

Example

  • progress.txt or tasks.json contains:
    • goal
    • scope
    • validation commands
    • done criteria
  • Agent is only allowed to pick tasks that meet the spec format.

Review-First AI Coding

A reviewer agent runs even when implementation succeeded, before merge.

Example

  • Worker commits feature branch
  • Reviewer agent runs "find regressions/security issues only"
  • Inline findings become blocking checks

What the Future Looks Like (Practical Prediction)

The next gains will come less from larger prompts and more from better systems.

Likely trajectory

  • stronger task specs and task leasing
  • parallel worktree orchestration
  • dedicated reviewer/security agents
  • merge queues and policy gates
  • ephemeral per-agent environments (DB seed + preview URL)
  • continuous maintenance automations

Human role in the future

  • define constraints, not line-by-line code
  • set acceptance criteria and risk tolerance
  • approve architectural changes
  • review ambiguous tradeoffs and product direction
  • design the verification system
mindmap
  root((Future AI Engineering))
    Orchestration
      Planner
      Task leasing
      Worktrees
      Merge queue
    Verification
      Typecheck
      Tests
      Build
      Security review
      Runtime checks
    Operations
      Preview envs
      Seeded DBs
      Rollbacks
      Observability
    Human Oversight
      Architecture
      Prioritization
      Risk decisions
      Product intent
Loading

Recommended Next Steps for This Repo

  1. Add a worktree-aware RALPH runner (/.ralph/worktree-loop.sh)
  2. Introduce task leasing to prevent two agents from taking the same task
  3. Add a review mode (diff review + findings only)
  4. Add a merge gate workflow for multi-branch agent output
  5. Gradually move progress.txt task specs to a stricter structure (can still stay human-readable)

Operating Principle

As autonomy increases, prompt quality matters less than:

  • task clarity
  • isolation
  • verification
  • review discipline
  • merge/deploy guardrails
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment