Skip to content

Instantly share code, notes, and snippets.

@vassvik
Created April 2, 2026 00:53
Show Gist options
  • Select an option

  • Save vassvik/93711ec9f74c8c3fef914cb4989bbb55 to your computer and use it in GitHub Desktop.

Select an option

Save vassvik/93711ec9f74c8c3fef914cb4989bbb55 to your computer and use it in GitHub Desktop.

AI Code Review Pipeline

This is a multi-agent code review system built on Claude Code and claude -p (the non-interactive CLI). It reviews large PRs by breaking them into groups, examining them in dependency order, and producing a verified findings report.

Agent architecture

The system uses three types of Claude sessions, each with a custom system prompt and different tool access:

  • Tool-free reviewer agents (claude -p --tools "") — no file access, no code execution. They receive everything they need as text input: diffs, focus instructions, known facts, PR description. They can only read and reason. This constraint is intentional — it prevents reviewers from wandering through the codebase and forces all context to be curated upfront. Their system prompt defines what to flag (logic errors, leaks, races, broken API contracts), what to skip (style, missing tests), and critically — what to do when uncertain (emit a QUESTION finding with a specific verifiable question, rather than building on assumptions).
  • The executor — a Claude Code session with a custom system prompt and restricted tools (Read, Write, Edit, Bash, Grep, Glob). It orchestrates the review: assembles reviewer inputs, dispatches agents, verifies findings against the codebase between turns, and manages the multi-turn flow. Its system prompt emphasizes stopping after each step to report results, not loading diffs into its own context, and monitoring its context growth.
  • The designer/coordinator — a Claude Code session that plans the review (grouping, exclusions, known-facts pre-verification), audits the plan's consistency, then stays active as the coordinator and auditor.

A synthesis agent (also tool-free, with its own system prompt) merges findings into the final report. Its prompt emphasizes grouping by narrative rather than severity, preserving evidence, and not making merge recommendations.

Two phases

The review runs in two distinct concurrent sessions.

The design session takes a branch and produces a self-contained review plan. It maps the branch with git diff --numstat, classifies files by subsystem, groups them by review concern, identifies exclusions (binaries, already-reviewed code, transferred code), and pre-verifies technical facts that reviewers would otherwise waste turns asking about (e.g., "does this API clean up on overwrite?"). The output is a review directory with filtered diffs, context files, known facts, and a startup document specifying exactly how the review should proceed. This session then stays active as the coordinator and auditor.

The execution session reads the startup document and runs the plan. It dispatches the tool-free reviewer agents, manages the multi-turn conversation flow, and verifies findings against the codebase between turns. Each reviewer is a multi-turn claude -p session that receives the relevant diffs, focus instructions, known facts, and context. The reviewer examines the code, reports findings, and flags uncertainties as explicit QUESTIONs — rather than building analysis on unverified assumptions. Between turns, the executor verifies those questions by reading the actual codebase, then feeds answers back into the reviewer's next turn.

Reviewer structure

The system dispatches one or more sequential reviewers depending on natural splitting points in the dependency graph. For PRs that span foundation and consumer code, this typically means:

  • Reviewer 1 examines foundations (e.g., the GPU abstraction layer, import pipeline, entity system). It produces findings plus an API changes summary describing what changed at each interface.
  • Reviewer 2 examines consumers (e.g., the rendering pipeline, shaders, application integration). It receives Reviewer 1's findings and API summary as input, so it can verify that consumer code correctly uses the foundation APIs.

Groups that need to be seen together belong in the same reviewer. Groups that can be reasonably split go to separate reviewers — the system isn't limited to two.

Each reviewer goes through examination turns (one group per turn), then closing turns:

  • Reflection — reconsider all findings with the full picture, catch cross-group patterns
  • Summary — complete findings report
  • Post-mortem — what was hard to assess, what context was missing, what would help next time

Verification

Findings are verified at two levels:

  1. Between turns — the executor resolves QUESTION findings by reading code, running tests, or checking the PR description. Answers feed back into the reviewer's next turn.
  2. Before synthesis — after all reviewers complete, the executor independently verifies all substantive findings against the codebase. Confirmed, dismissed, or deferred (to the user) — each finding is annotated before going to synthesis.

Synthesis

The synthesis agent merges the verified findings from all reviewers into a final report. It groups findings by narrative (what's wrong and why), preserves file:line references as clickable GitHub links, identifies cross-cutting patterns, and lists verified-clean areas.

Closing

After the report is delivered, both the executor and coordinator sessions reflect on the process — what went well, what went wrong, cache behavior, cost observations, and suggestions for the next review. These post-mortems, combined with the reviewer post-mortems, feed into future review designs as accumulated knowledge about context gaps and process improvements.

What the report looks like

  • Summary table of all findings with severity, type, and linked source locations
  • Grouped findings with detailed descriptions, evidence, and suggested fixes
  • Cross-cutting patterns — issues that appear across subsystems
  • Clean areas — everything that was verified correct (often the most valuable section for the PR author)
  • Limitations — what couldn't be verified and why, synthesized from reviewer post-mortems
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment