Skip to content

Instantly share code, notes, and snippets.

@ahoward
Last active April 14, 2026 00:13
Show Gist options
  • Select an option

  • Save ahoward/1083bccb8acf8e6ccb18fe07ff728a02 to your computer and use it in GitHub Desktop.

Select an option

Save ahoward/1083bccb8acf8e6ccb18fe07ff728a02 to your computer and use it in GitHub Desktop.
joust — adversarial peer review CLI for LLMs

joust

adversarial peer review for LLMs — a priority-ranked panel of agents charges at each other's analysis, the master synthesizes one hardened output


What It Is

joust is a CLI tool that implements adversarial peer review across a ranked panel of LLM agents. You define a list of agents with priority levels. The highest-priority agent is the master — it orchestrates the joust and writes the final synthesis. Lower-priority agents are jousters — they produce independent analyses and cross-examine each other's work.

The pattern, generalized:

prompt
  → [priority 1: jouster A analysis]
  → [priority 1: jouster B analysis]     ← all priority-1 agents run in parallel
  → [priority 1: jouster N analysis]
  → [priority 1: A reviews B, A reviews N]
  → [priority 1: B reviews A, B reviews N]   ← cross-exam runs in parallel
  → [priority 1: N reviews A, N reviews B]
  → [priority 0: master receives all analyses + all cross-exams]
  → [priority 0: master synthesizes final output]

Priority 0 is the master. It never jousts — it judges. Priority 1+ agents are jousters. No jouster sees another's analysis until after its own first pass. The master sees everything.

With a single agent (priority 0 only), joust degenerates to a single high-quality pass — still useful, no cross-examination. With one master and one jouster, you get the classic two-model adversarial review. With one master and N jousters, you get a full panel.


Why It Exists

A single LLM pass on a hard question has a failure mode: confident, well-structured, wrong. The model produces a coherent narrative, you ship it, something breaks.

Multiple independent passes catch more. Passes that then attack each other catch much more. A senior master agent that has seen all the analyses and all the critiques and is explicitly tasked with adjudication produces something qualitatively better than any single pass.

This is what smart humans do: a panel of domain experts each writes a review, the reviews are circulated, each expert responds to the others' critiques, and the editor-in-chief synthesizes a final finding. joust automates that loop across models.

The priority/rank model matters because not all agents are equal. You want your best, most expensive, most capable model as master — it should be held in reserve for the hard work of synthesis and adjudication. The jousters can be cheaper, faster, more numerous. Slop budget goes to the jousters. Quality budget goes to the master.


Who It's For

  • Developers and architects making non-trivial technical decisions (schema changes, API design, system migrations)
  • Researchers who want higher-confidence analysis on complex questions
  • Anyone who has been burned by a single-model hallucination on something that mattered

Core Concepts

Agents

An agent is a named configuration: a model, a provider, a priority, and optional overrides (temperature, system prompt, etc.).

[agents.opus]
provider  = "anthropic"
model     = "claude-opus-4-6"
priority  = 0          # master — synthesizes, never jousts

[agents.sonnet]
provider  = "anthropic"
model     = "claude-sonnet-4-6"
priority  = 1          # jouster

[agents.gemini]
provider  = "google"
model     = "gemini-2.5-pro"
priority  = 1          # jouster

[agents.flash]
provider  = "google"
model     = "gemini-2.5-flash"
priority  = 2          # lower-priority jouster — used for broad coverage, not depth

Priority semantics:

  • priority = 0master. Exactly one master per joust. Receives all jouster outputs and all cross-examinations. Writes the final synthesis. Never produces a Round 1 analysis.
  • priority = 1primary jousters. Produce Round 1 analyses. Cross-examine all other priority-1 agents. Their outputs are the primary inputs to the master.
  • priority = 2+secondary jousters. Produce Round 1 analyses but are not cross-examined by primary jousters (unless --full-cross is set). Their analyses are passed to the master as supplementary input. Useful for cheap broad coverage — throw a fast cheap model at the problem for additional surface area.

If no priority = 0 agent is defined, joust uses the lowest-numbered priority agent as master. If only one agent is defined (any priority), it runs as a single-pass analysis with no cross-examination.

Rounds

  1. Analysis — all jousters (priority 1+) produce independent analyses in parallel. No jouster sees another's output.
  2. Cross-examination — priority-1 jousters cross-examine each other in parallel. Each receives every other priority-1 analysis and is instructed to find gaps, errors, and missing considerations. Priority-2+ analyses are passed to the master as supplementary but are not cross-examined unless --full-cross.
  3. Synthesis — the master receives: the original prompt, all analyses (all priorities), all cross-examinations, and its own (optional) system context. It produces one unified final output.

The Single-Agent Case

joust --agents opus "Is this migration plan sound?"

One agent, priority 0. No jousters. Runs as a high-quality single-pass analysis with the master's system prompt (which instructs it to be exhaustive, flag uncertainty, look for gaps). No cross-examination. Output is the master's direct response.

Useful when you want joust's prompting discipline without the multi-model overhead.

The N-Jouster Case

joust --agents opus,sonnet,gemini,flash,gpt4 "Design the authorization model for peer lots"

One master (opus, priority 0), four jousters. Priority-1 agents cross-examine each other. Priority-2 agents contribute analyses only. Master synthesizes everything. Maximum coverage, maximum cost.


CLI

joust [options] <prompt>
joust [options] --file <prompt-file>

Options

-a, --agents <list>       comma-separated agent names from config, in priority order
                          first agent is master (priority 0) unless priorities set in config
-r, --rounds <n>          number of cross-exam rounds (default: 1)
-o, --output <file>       write output to file
--format <fmt>            text (default), md, json
--full                    include all rounds in output, not just synthesis
--full-cross              cross-examine all jousters including priority 2+ (expensive)
--no-synthesis            stop after cross-exam, emit all analyses and critiques
--stream                  stream master synthesis output (default: true)
--system <file>           override master system prompt
--temperature <n>         temperature for all agents (default: 0.3)
--quiet                   suppress progress indicators
--dry-run                 show what would run without calling any APIs

Examples

# Simplest — uses default agents from config
joust "What are the security implications of JWT vs session cookies?"

# Single agent — no cross-exam, just high-quality single pass
joust --agents opus "Is this migration plan sound?"

# Classic two-model joust — one master, one jouster
joust --agents opus,gemini "Design a rate limiting strategy"

# Full panel
joust --agents opus,sonnet,gemini,flash "Should we use event sourcing here?"

# From file, markdown output
joust --file architecture-proposal.md --format md -o review.md

# See all rounds
joust --full --format md --agents opus,sonnet,gemini "Critique this API design"

# No synthesis — read the panel yourself
joust --no-synthesis --agents opus,sonnet,gemini "Is GraphQL the right choice?"

# Multiple cross-exam rounds
joust --rounds 3 --agents opus,sonnet,gemini "Peer lots impact analysis"

# Dry run — see the agent lineup and prompt plan without spending tokens
joust --dry-run --agents opus,sonnet,gemini,flash "big question"

Configuration

joust reads config from ~/.joust/config.toml (or JOUST_CONFIG env var).

[defaults]
agents      = ["opus", "sonnet", "gemini"]   # default panel
format      = "text"
rounds      = 1
temperature = 0.3

# Priority 0 — master agent
[agents.opus]
provider    = "anthropic"
model       = "claude-opus-4-6"
priority    = 0
api_key_env = "ANTHROPIC_API_KEY"

# Priority 1 — primary jousters
[agents.sonnet]
provider    = "anthropic"
model       = "claude-sonnet-4-6"
priority    = 1
api_key_env = "ANTHROPIC_API_KEY"

[agents.gemini]
provider    = "google"
model       = "gemini-2.5-pro"
priority    = 1
api_key_env = "GEMINI_API_KEY"

# Priority 2 — secondary jousters (cheap, fast, supplementary)
[agents.flash]
provider    = "google"
model       = "gemini-2.5-flash"
priority    = 2
api_key_env = "GEMINI_API_KEY"

[agents.gpt4]
provider    = "openai"
model       = "gpt-4o"
priority    = 1
api_key_env = "OPENAI_API_KEY"

CLI priority override: When --agents is passed as a list without explicit priorities, the first agent is treated as master (priority 0) and the rest as priority-1 jousters. Priority config in ~/.joust/config.toml takes precedence over position if the agent name is found there.

API keys are also picked up directly from env without config: ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY.


Prompting Strategy

Master system prompt (priority 0)

"You are the master analyst. You have received independent analyses from a panel of reviewers, plus each reviewer's critique of the others. Your job is to produce one unified, hardened output. Incorporate all valid critiques. Where analyses conflict, adjudicate explicitly — state which position is correct and why. Where all analyses agree, note the consensus. Do not summarize. Synthesize. The output must be better than any individual input."

Primary jouster system prompt (priority 1 — Round 1)

"You are performing an independent analysis. Do not hedge. Be exhaustive. Flag uncertainty explicitly. Do not attempt to cover every angle superficially — go deep on what matters. You will not see other analysts' work until after you submit your own."

Cross-examination prompt (priority 1 — Round 2)

"You are reviewing another analyst's work. Your job is to find what is wrong, incomplete, or missing. Do not validate what they got right — find the gaps. Be direct and specific. If something looks correct, move on. If something is wrong, say exactly why."

Secondary jouster system prompt (priority 2+)

"You are providing supplementary analysis. Be concise. Focus on angles that a primary analyst might miss — unconventional risks, edge cases, second-order effects. Do not attempt to be comprehensive. Flag the two or three things that most concern you."

All system prompts are customizable per-agent in config or globally via --system.


Output Format

Default (text)

Streams the master's synthesis to stdout. Clean, no framing.

--format md

# Joust

**Prompt:** ...
**Master:** opus (claude-opus-4-6)
**Panel:** sonnet (priority 1), gemini (priority 1), flash (priority 2)
**Rounds:** 1
**Date:** 2026-04-13

---

## Synthesis

[master output]

---

## Round 1: Analyses

### sonnet (priority 1)
[analysis]

### gemini (priority 1)
[analysis]

### flash (priority 2 — supplementary)
[analysis]

---

## Round 2: Cross-Examination

### sonnet reviews gemini
[critique]

### gemini reviews sonnet
[critique]

--format json

Full structured output — all rounds, all agents, token counts, latency per call, model metadata. Good for piping, archiving, or feeding into a downstream tool.


Execution Model

Phase 1 — Analysis (parallel)
  all priority 1+ agents → independent analysis calls
  wall time = max(agent latency)

Phase 2 — Cross-examination (parallel)
  priority 1 agents → each reviews all other priority-1 analyses
  priority 2+ analyses → passed directly to master, not cross-examined (unless --full-cross)
  wall time = max(cross-exam latency)

Phase 3 — Synthesis (sequential)
  master receives: prompt + all analyses + all cross-exams
  master produces final output (streamed)
  wall time = master latency

Total wall time ≈ max(P1 analysis) + max(P1 cross-exam) + master synthesis
                ≈ 20–90s for a typical 3-agent joust

Cost scales with number of jousters and rounds. A 4-jouster panel with 2 cross-exam rounds and an expensive master is the most expensive configuration. --dry-run shows estimated token counts before committing.


Error Handling

  • If a jouster fails in Round 1, joust continues with remaining agents and notes the failure
  • If the master fails, joust aborts (no synthesis possible)
  • If all jousters fail, joust falls back to master-only single pass
  • Failures are always reported in output metadata

Non-Goals (v1)

  • No web UI
  • No conversation memory / multi-turn jousts
  • No fine-tuning or custom model support
  • No RAG / document ingestion (prompt files are sufficient)
  • No team features, sharing, accounts
  • Not a general LLM client — joust is for adversarial review only

v1 Scope

  • joust <prompt> works end to end
  • Single-agent, two-agent, and N-agent modes all work
  • Priority 0/1/2 agent tiers work
  • Claude + Gemini as default agents, OpenAI as optional third
  • --full, --format md, --dry-run work
  • Config file support (~/.joust/config.toml)
  • Ships as a Ruby gem: gem install joust
  • README with 4 real worked examples (single, dual, panel, file input)

v2 Ideas

  • --rounds N — multi-round iteration (agents keep attacking until consensus or N rounds exhausted)
  • joust diff <file-a> <file-b> — joust two existing documents against each other
  • Named joust profiles (joust --profile security-review)
  • Inter-priority cross-examination: priority-1 agents review priority-2 analyses (--full-cross)
  • Master rebuttal round: after synthesis, jousters get one final response to the master's conclusions
  • joust replay <json> — re-run synthesis on saved round data with a different master
  • Cost estimation before run (--estimate)
  • joust init — interactive config setup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment