| name | review |
|---|---|
| description | Exhaustive, evidence-driven code review over any scope — staged or unstaged changes, a branch diff, a PR, a single commit, a commit range, selected files/modules, or the whole repository. Surfaces every actionable issue (correctness, reliability, performance, maintainability, and architecture), walks the findings one at a time to collect a decision on each, then implements the chosen fixes with regression tests where a harness exists. Use when the user asks to review, audit, or critique staged/unstaged changes, a branch, a PR, a commit or range, specific files, or an entire repo. |
| user_invocable | true |
| disable-model-invocation | true |
| argument | Free-form description of what to review, such as "staged changes", "this PR", "main...HEAD", "abc1234", "Sources/Auth", or "entire repo". |
Perform an exhaustive, evidence-driven code review over whatever scope the user describes. Find all concrete issues worth acting on, from critical bugs through nits, inconsistencies, maintainability problems, and architectural concerns, then walk the user through findings one at a time, collecting a decision on each before moving to the next. Only after every finding has been decided do you implement the chosen fixes in a single batch.
Interpret the user's free-form argument and choose the narrowest review surface that matches it.
Common scopes:
| User says | Review scope |
|---|---|
| No argument | Staged changes if present; otherwise local unstaged changes if present; otherwise say there is no default diff and offer a repo-wide review |
staged |
git diff --cached |
unstaged |
git diff |
staged or unstaged, local changes, working tree |
Review both git diff --cached and git diff, plus relevant untracked files |
branch against main, branch vs main |
Diff against the repo's default branch — detect it with git symbolic-ref refs/remotes/origin/HEAD (fall back to main, then master) — preferring git diff origin/<default>...HEAD, otherwise git diff <default>...HEAD |
this PR, current PR |
Resolve the PR and its base branch with gh pr view; get the diff with gh pr diff (and read the changed files for context). If the hosted diff is noisy, use the local base diff such as origin/<base>...HEAD |
| A commit SHA | git show <sha> |
| A commit range | git diff <range> |
| A file, directory, module, or feature area | Read that area directly and inspect related tests, call sites, and recent changes when relevant |
entire repo, full repo, repo audit |
Perform a comprehensive repository review |
If the chosen diff is empty but the user's wording asks for a broader audit, pivot to that broader audit instead of stopping. If the user truly requested only an empty diff, report that there is nothing in that scope to review.
Before judging code, understand the project:
- Read project instructions and conventions such as
AGENTS.md,README, and top-level config files. - Check repository state with commands like
git status --short,git branch --show-current, and the relevant diff/log command for the chosen scope. - For diff-based reviews, read the changed files with surrounding context, not just the patch.
- For repo-wide reviews, map the directory structure, core modules, entry points, data stores, external APIs, background jobs, UI flows, and test layout.
- Identify the test framework and the smallest useful test commands for the affected areas. When fixes are likely, run those tests once to confirm a green baseline before relying on them for the test-driven fixes in Step 6.
Look for both targeted defects and broader design issues.
Targeted review categories (short cues, not exhaustive checklists):
- Correctness: logic errors, bad state transitions, invalid assumptions, missing edge cases, wrong API usage, data/persistence corruption, broken migrations.
- Concurrency and async: races, deadlocks, actor/thread isolation, TOCTOU, cancellation leaks, lock ordering, unsafe shared state.
- Reliability and error handling: swallowed errors, missing validation, weak retry/accounting, bad fallbacks, missing observability for important failures.
- Performance and scalability: N+1 queries, redundant expensive work, poor data structures, unbounded memory or task growth, hot-path regressions.
- Compatibility and integration: platform/version mismatches, API or schema contract drift, feature-flag mistakes, environment-specific failures.
- Dependencies, build, and release: unsafe dependency or lockfile changes, build/packaging/signing risks, missing rollout or rollback.
- Coupling and maintainability: implicit ordering, stringly-typed interfaces, duplicated logic, unclear boundaries, over-complex control flow.
- API ergonomics and foot-guns: misleading names, surprising defaults, easy-to-misuse patterns, hidden ordering requirements, weak invariants.
- Tests and tooling: missing coverage for changed behavior, brittle tests, helpers that hide real behavior, CI-only failure risks.
- User experience and accessibility: confusing flows, broken empty/loading/error states, focus/keyboard issues, accessibility or localization regressions.
- Documentation and contract consistency: stale comments or docs, misleading PR descriptions, tests or docs that encode a different contract than the code.
- Cleanup: dead code, unused imports, inconsistent naming, unreachable branches.
Architectural observations:
- Missing or blurred boundaries between layers.
- Modules that own too many responsibilities or live in the wrong place.
- Repeated patterns that should become a shared abstraction.
- Data paths, state models, or workflow contracts that will become painful as the codebase grows.
- Strategic simplifications that reduce risk without broad churn.
For repo-wide reviews, work module by module and do not stop at the first plausible issue. For diff reviews, examine changed behavior through its call sites, data flow, and tests. If a suspected issue is only theoretical, verify user reachability and realistic timing before presenting it as a finding.
Collect all findings before presenting them.
Give each finding a severity, ranked by real impact:
- Critical: data loss, crash, severe correctness regression, broken production path.
- Warning: likely user-visible bug, race, reliability gap, significant performance issue, brittle contract.
- Nit: naming, small cleanup, local duplication, minor maintainability issue.
Independently, tag a finding Architecture/Design when it concerns boundaries, responsibilities, shared abstractions, or data/workflow contracts rather than a single localized defect. This tag is orthogonal to severity — a design issue can be anything from Critical to Nit — so order findings by real impact and never demote one just because it is architectural.
Each finding must include:
- Severity, the Architecture/Design tag when it applies, and a short title.
- Exact file paths and line numbers for localized issues; for architectural or cross-cutting findings, cite the relevant files, ranges, or module paths instead.
- A clear description of the broken behavior or design problem.
- Evidence from the code, with short snippets only when useful.
- The user-visible or maintenance impact.
- A recommended fix and, when there are real tradeoffs, up to two alternatives.
- A test plan when the recommended action changes behavior worth testing and a usable test harness exists: name the red-green regression test to add or update. Keep this in the finding body, not crammed into a choice label.
Be complete but honest: include every concrete issue in scope — bugs, inconsistencies, inefficiencies, duplication, brittle contracts, foot-guns, maintainability problems, architectural drift, missing tests, and useful cleanup — and rank them truthfully by severity rather than filtering them out. Do not pad the review with speculative or low-value comments, and if an earlier suspected issue turns out to be false, withdraw it. If the scope contains no actionable issues, say so plainly and stop; never invent findings to fill the review.
Write the collected findings to a scratch file outside the tracked worktree (for example, under the system temp directory) as the source of truth. Record for each: id, severity, Architecture/Design tag, file:line or area, evidence, impact, recommendation, and an empty decision field. Work from this file through Steps 5 and 6 so details survive a long review, and delete it when the review is complete.
This step is strictly about collecting decisions, not implementing them. Do not edit code, write tests, or run fixes during Step 5. Implementation happens in Step 6, after every finding has a decision.
Always walk findings one at a time, highest impact first. Never dump the full list for batch triage, and never collapse nits into a group — one finding at a time is the only mode, no matter how many findings there are.
For each finding, in order:
- Show the finding in a short message: number (
Finding N of M), severity and the Architecture/Design tag if it applies, title, file path and line number (or area), the issue and its user-visible or maintenance impact, and your recommendation with a one-sentence reason. If the recommended action warrants a regression test, describe that test here in the finding body. - Ask the user to choose what to do with this finding. Use the richest interactive prompting mechanism available — if your environment supports buttoned multiple-choice prompts, use them; otherwise present a short numbered list inline and ask the user to reply with a number or a freeform answer. Whichever surface you use, keep the option labels short and always include:
- The recommended action as the first option, clearly marked
(Recommended). - Up to two additional concrete alternatives when real tradeoffs exist.
- A
Skipoption so the user can decline this finding. - Make clear that the user can also respond freeform (override the options, ask for more detail, or stop).
- The recommended action as the first option, clearly marked
- Record the user's decision (chosen action and any notes) in the scratch file. Do not start fixing yet.
- Move on to the next finding and repeat.
If the user challenges a finding, re-check the relevant code and update or withdraw the finding before asking for a decision — don't defend stale analysis. If the user asks for more detail, expand inline before re-asking. If the user says stop, abandon the remaining findings and skip to Step 6 with whatever has been decided so far.
Before moving to Step 6, present a short recap of the decisions collected — one line per finding (Finding N: <title> → <chosen action>), grouped by "Will fix" vs "Skipped". Ask for a final confirmation to proceed with implementation. Only after the user confirms, move to Step 6.
Now implement every fix the user chose, working from the scratch file. Apply accepted Architecture/Design changes first when they would subsume or reshape smaller fixes; otherwise work highest impact first (Critical → Warning → Nit). Keep each change focused on its finding; do not bundle unrelated cleanup.
For any selected fix that changes runtime behavior, data contracts, user-visible behavior, persistence, API semantics, or other logic worth testing, use red-green TDD wherever a usable test harness exists or can reasonably be created. This includes bug fixes, correctness issues, crashes, race conditions, data integrity problems, and any requested non-defect change whose new behavior should be pinned down:
- Write or update a regression test before editing production code.
- Run the smallest test scope that includes the regression test.
- Confirm the test fails for the right reason: the failure must reproduce the reviewed defect, not a typo, missing import, setup error, or unrelated failure.
- Record the relevant failure output so the fix can be verified against it.
- Only after the red test is confirmed, edit the production code.
- Re-run the regression test and confirm it passes.
- Run the broader relevant test suite, formatter, linter, or type checker for the changed area.
Where a usable harness exists, do not skip the failing-first step for defects. If no harness exists or red-green is genuinely impractical for a given fix, note that explicitly and proceed rather than blocking the fix on impossible tests. If a test that should fail cannot be made to fail for the right reason, stop and explain the blocker before writing production code.
Documentation, naming, and mechanical cleanup do not need failing-first tests unless they expose behavior.
After implementing each fix, briefly note in chat which finding it resolved so the user can track progress against the recap. When all selected fixes are done, post a final summary listing what was fixed and what was skipped, and delete the scratch file.
- Respect dirty worktrees. Do not revert or overwrite unrelated user changes.
- Read enough surrounding code to avoid patch-only false positives.
- Prefer the project's existing patterns and test style.
- Keep remediation options concrete and scoped.
- Surface uncertainty clearly when evidence is incomplete.