adversarial peer review for LLMs — a priority-ranked panel of agents charges at each other's analysis, the master synthesizes one hardened output
joust is a CLI tool that implements adversarial peer review across a ranked panel of LLM agents. You define a list of agents with priority levels. The highest-priority agent is the master — it orchestrates the joust and writes the final synthesis. Lower-priority agents are jousters — they produce independent analyses and cross-examine each other's work.
The pattern, generalized:
prompt
→ [priority 1: jouster A analysis]
→ [priority 1: jouster B analysis] ← all priority-1 agents run in parallel
→ [priority 1: jouster N analysis]
→ [priority 1: A reviews B, A reviews N]
→ [priority 1: B reviews A, B reviews N] ← cross-exam runs in parallel
→ [priority 1: N reviews A, N reviews B]
→ [priority 0: master receives all analyses + all cross-exams]
→ [priority 0: master synthesizes final output]
Priority 0 is the master. It never jousts — it judges. Priority 1+ agents are jousters. No jouster sees another's analysis until after its own first pass. The master sees everything.
With a single agent (priority 0 only), joust degenerates to a single high-quality pass — still useful, no cross-examination. With one master and one jouster, you get the classic two-model adversarial review. With one master and N jousters, you get a full panel.
A single LLM pass on a hard question has a failure mode: confident, well-structured, wrong. The model produces a coherent narrative, you ship it, something breaks.
Multiple independent passes catch more. Passes that then attack each other catch much more. A senior master agent that has seen all the analyses and all the critiques and is explicitly tasked with adjudication produces something qualitatively better than any single pass.
This is what smart humans do: a panel of domain experts each writes a review, the reviews are circulated, each expert responds to the others' critiques, and the editor-in-chief synthesizes a final finding. joust automates that loop across models.
The priority/rank model matters because not all agents are equal. You want your best, most expensive, most capable model as master — it should be held in reserve for the hard work of synthesis and adjudication. The jousters can be cheaper, faster, more numerous. Slop budget goes to the jousters. Quality budget goes to the master.
- Developers and architects making non-trivial technical decisions (schema changes, API design, system migrations)
- Researchers who want higher-confidence analysis on complex questions
- Anyone who has been burned by a single-model hallucination on something that mattered
An agent is a named configuration: a model, a provider, a priority, and optional overrides (temperature, system prompt, etc.).
[agents.opus]
provider = "anthropic"
model = "claude-opus-4-6"
priority = 0 # master — synthesizes, never jousts
[agents.sonnet]
provider = "anthropic"
model = "claude-sonnet-4-6"
priority = 1 # jouster
[agents.gemini]
provider = "google"
model = "gemini-2.5-pro"
priority = 1 # jouster
[agents.flash]
provider = "google"
model = "gemini-2.5-flash"
priority = 2 # lower-priority jouster — used for broad coverage, not depthPriority semantics:
priority = 0— master. Exactly one master per joust. Receives all jouster outputs and all cross-examinations. Writes the final synthesis. Never produces a Round 1 analysis.priority = 1— primary jousters. Produce Round 1 analyses. Cross-examine all other priority-1 agents. Their outputs are the primary inputs to the master.priority = 2+— secondary jousters. Produce Round 1 analyses but are not cross-examined by primary jousters (unless--full-crossis set). Their analyses are passed to the master as supplementary input. Useful for cheap broad coverage — throw a fast cheap model at the problem for additional surface area.
If no priority = 0 agent is defined, joust uses the lowest-numbered priority agent as master. If only one agent is defined (any priority), it runs as a single-pass analysis with no cross-examination.
- Analysis — all jousters (priority 1+) produce independent analyses in parallel. No jouster sees another's output.
- Cross-examination — priority-1 jousters cross-examine each other in parallel. Each receives every other priority-1 analysis and is instructed to find gaps, errors, and missing considerations. Priority-2+ analyses are passed to the master as supplementary but are not cross-examined unless
--full-cross. - Synthesis — the master receives: the original prompt, all analyses (all priorities), all cross-examinations, and its own (optional) system context. It produces one unified final output.
joust --agents opus "Is this migration plan sound?"One agent, priority 0. No jousters. Runs as a high-quality single-pass analysis with the master's system prompt (which instructs it to be exhaustive, flag uncertainty, look for gaps). No cross-examination. Output is the master's direct response.
Useful when you want joust's prompting discipline without the multi-model overhead.
joust --agents opus,sonnet,gemini,flash,gpt4 "Design the authorization model for peer lots"One master (opus, priority 0), four jousters. Priority-1 agents cross-examine each other. Priority-2 agents contribute analyses only. Master synthesizes everything. Maximum coverage, maximum cost.
joust [options] <prompt>
joust [options] --file <prompt-file>
-a, --agents <list> comma-separated agent names from config, in priority order
first agent is master (priority 0) unless priorities set in config
-r, --rounds <n> number of cross-exam rounds (default: 1)
-o, --output <file> write output to file
--format <fmt> text (default), md, json
--full include all rounds in output, not just synthesis
--full-cross cross-examine all jousters including priority 2+ (expensive)
--no-synthesis stop after cross-exam, emit all analyses and critiques
--stream stream master synthesis output (default: true)
--system <file> override master system prompt
--temperature <n> temperature for all agents (default: 0.3)
--quiet suppress progress indicators
--dry-run show what would run without calling any APIs
# Simplest — uses default agents from config
joust "What are the security implications of JWT vs session cookies?"
# Single agent — no cross-exam, just high-quality single pass
joust --agents opus "Is this migration plan sound?"
# Classic two-model joust — one master, one jouster
joust --agents opus,gemini "Design a rate limiting strategy"
# Full panel
joust --agents opus,sonnet,gemini,flash "Should we use event sourcing here?"
# From file, markdown output
joust --file architecture-proposal.md --format md -o review.md
# See all rounds
joust --full --format md --agents opus,sonnet,gemini "Critique this API design"
# No synthesis — read the panel yourself
joust --no-synthesis --agents opus,sonnet,gemini "Is GraphQL the right choice?"
# Multiple cross-exam rounds
joust --rounds 3 --agents opus,sonnet,gemini "Peer lots impact analysis"
# Dry run — see the agent lineup and prompt plan without spending tokens
joust --dry-run --agents opus,sonnet,gemini,flash "big question"joust reads config from ~/.joust/config.toml (or JOUST_CONFIG env var).
[defaults]
agents = ["opus", "sonnet", "gemini"] # default panel
format = "text"
rounds = 1
temperature = 0.3
# Priority 0 — master agent
[agents.opus]
provider = "anthropic"
model = "claude-opus-4-6"
priority = 0
api_key_env = "ANTHROPIC_API_KEY"
# Priority 1 — primary jousters
[agents.sonnet]
provider = "anthropic"
model = "claude-sonnet-4-6"
priority = 1
api_key_env = "ANTHROPIC_API_KEY"
[agents.gemini]
provider = "google"
model = "gemini-2.5-pro"
priority = 1
api_key_env = "GEMINI_API_KEY"
# Priority 2 — secondary jousters (cheap, fast, supplementary)
[agents.flash]
provider = "google"
model = "gemini-2.5-flash"
priority = 2
api_key_env = "GEMINI_API_KEY"
[agents.gpt4]
provider = "openai"
model = "gpt-4o"
priority = 1
api_key_env = "OPENAI_API_KEY"CLI priority override: When --agents is passed as a list without explicit priorities, the first agent is treated as master (priority 0) and the rest as priority-1 jousters. Priority config in ~/.joust/config.toml takes precedence over position if the agent name is found there.
API keys are also picked up directly from env without config: ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY.
"You are the master analyst. You have received independent analyses from a panel of reviewers, plus each reviewer's critique of the others. Your job is to produce one unified, hardened output. Incorporate all valid critiques. Where analyses conflict, adjudicate explicitly — state which position is correct and why. Where all analyses agree, note the consensus. Do not summarize. Synthesize. The output must be better than any individual input."
"You are performing an independent analysis. Do not hedge. Be exhaustive. Flag uncertainty explicitly. Do not attempt to cover every angle superficially — go deep on what matters. You will not see other analysts' work until after you submit your own."
"You are reviewing another analyst's work. Your job is to find what is wrong, incomplete, or missing. Do not validate what they got right — find the gaps. Be direct and specific. If something looks correct, move on. If something is wrong, say exactly why."
"You are providing supplementary analysis. Be concise. Focus on angles that a primary analyst might miss — unconventional risks, edge cases, second-order effects. Do not attempt to be comprehensive. Flag the two or three things that most concern you."
All system prompts are customizable per-agent in config or globally via --system.
Streams the master's synthesis to stdout. Clean, no framing.
# Joust
**Prompt:** ...
**Master:** opus (claude-opus-4-6)
**Panel:** sonnet (priority 1), gemini (priority 1), flash (priority 2)
**Rounds:** 1
**Date:** 2026-04-13
---
## Synthesis
[master output]
---
## Round 1: Analyses
### sonnet (priority 1)
[analysis]
### gemini (priority 1)
[analysis]
### flash (priority 2 — supplementary)
[analysis]
---
## Round 2: Cross-Examination
### sonnet reviews gemini
[critique]
### gemini reviews sonnet
[critique]Full structured output — all rounds, all agents, token counts, latency per call, model metadata. Good for piping, archiving, or feeding into a downstream tool.
Phase 1 — Analysis (parallel)
all priority 1+ agents → independent analysis calls
wall time = max(agent latency)
Phase 2 — Cross-examination (parallel)
priority 1 agents → each reviews all other priority-1 analyses
priority 2+ analyses → passed directly to master, not cross-examined (unless --full-cross)
wall time = max(cross-exam latency)
Phase 3 — Synthesis (sequential)
master receives: prompt + all analyses + all cross-exams
master produces final output (streamed)
wall time = master latency
Total wall time ≈ max(P1 analysis) + max(P1 cross-exam) + master synthesis
≈ 20–90s for a typical 3-agent joust
Cost scales with number of jousters and rounds. A 4-jouster panel with 2 cross-exam rounds and an expensive master is the most expensive configuration. --dry-run shows estimated token counts before committing.
- If a jouster fails in Round 1, joust continues with remaining agents and notes the failure
- If the master fails, joust aborts (no synthesis possible)
- If all jousters fail, joust falls back to master-only single pass
- Failures are always reported in output metadata
- No web UI
- No conversation memory / multi-turn jousts
- No fine-tuning or custom model support
- No RAG / document ingestion (prompt files are sufficient)
- No team features, sharing, accounts
- Not a general LLM client —
joustis for adversarial review only
joust <prompt>works end to end- Single-agent, two-agent, and N-agent modes all work
- Priority 0/1/2 agent tiers work
- Claude + Gemini as default agents, OpenAI as optional third
--full,--format md,--dry-runwork- Config file support (
~/.joust/config.toml) - Ships as a Ruby gem:
gem install joust - README with 4 real worked examples (single, dual, panel, file input)
--rounds N— multi-round iteration (agents keep attacking until consensus or N rounds exhausted)joust diff <file-a> <file-b>— joust two existing documents against each other- Named joust profiles (
joust --profile security-review) - Inter-priority cross-examination: priority-1 agents review priority-2 analyses (
--full-cross) - Master rebuttal round: after synthesis, jousters get one final response to the master's conclusions
joust replay <json>— re-run synthesis on saved round data with a different master- Cost estimation before run (
--estimate) joust init— interactive config setup