Context Management Comparison: Opencode vs. Pi

This report compares the context management strategies of the Opencode agent and the Pi agent, based on trace analysis from the TODO app experiment on 2026-04-22 using the Kimi 2.5 model.

1. Overview of Context Management Strategies

Opencode

Opencode employs a highly structured, verbose, and comprehensive system prompt. It explicitly defines:

Tool Use Protocol: Detailed instructions on how to use tools, the importance of parallel tool calls, and the expected behaviors after tool results (e.g., "If you anticipate making multiple non-interfering tool calls, you are HIGHLY RECOMMENDED to make them in parallel to significantly improve efficiency.").
Coding Guidelines: Rigid rules for coding from scratch, bug fixes, features, and refactoring (e.g., "Make MINIMAL changes to achieve the goal. This is very important to your performance.").
Research Protocols: Structured guidance for research, including planning, internet searching, and multimedia processing (e.g., "Make plans before doing deep or wide research, to ensure you are always on track.").
Safety/Environment: Explicit constraints about the working directory, operating system, and safety protocols (e.g., "The operating environment is not in a sandbox. Any actions you do will immediately affect the user's system. So you MUST be extremely cautious.").
Project Context: Directs the agent to check AGENTS.md for project-specific conventions (e.g., "Markdown files named AGENTS.md usually contain the background, structure, coding styles, user preferences and other relevant information.").

This approach creates a high-overhead, highly-constrained context, which ensures consistent, safe actions but increases token usage significantly.

Pi

Pi employs a concise, harness-oriented system prompt. It emphasizes:

Role Definition: Focuses on being an "expert coding assistant" operating within a specific harness (e.g., "You are an expert coding assistant operating inside pi, a coding agent harness. You help users by reading files, executing commands, editing code, and writing new files.").
Concise Toolset: Lists available tools briefly without the extensive procedural rules found in Opencode (e.g., "Available tools: - read: Read file contents - bash: Execute bash commands - edit: Make precise file edits - write: Create or overwrite files").
Documentation Focus: Provides clear, direct links to its own documentation for extensions, themes, skills, and SDKs (e.g., Main documentation: /Users/borislau/.nvm/versions/node/v20.20.0/lib/node_modules/@mariozechner/pi-coding-agent/README.md).

This approach creates a low-overhead, flexible context, designed for rapid interaction and task execution, relying more on the agent's internalized knowledge than strict procedural guidelines.

2. Quantitative Performance Analysis (Kimi 2.5)

Phase	Agent	Activity (Runs)	Avg Latency (ms)	Avg Tokens
Research	Opencode	26	46,025.96	19,953.12
	Pi	27	32,339.11	12,574.93
Plan	Opencode	0	N/A	N/A
	Pi	3	5,486.00	2,233.67
Implementation	Opencode	3	6,088.00	12,640.67
	Pi	7	7,213.29	2,442.57
Unknown	Opencode	193	32,750.53	81,813.90
	Pi	5	8,291.00	2,473.00

3. Pairwise Analysis (Examples)

Prompt Scenario	Opencode Context	Pi Context
"Let's go with..."	Massive procedural prompt + context rules.	Concise, assistant-focused prompt + Pi docs links.
"For remaining..."	Massive procedural prompt + context rules.	Concise, assistant-focused prompt + Pi docs links.

3. Findings

Token Efficiency: The significant token usage discrepancy (Opencode using ~37% more tokens in Research phase) is directly attributable to Opencode's expansive system prompt being sent with every single trace.
Context Rigidity vs. Flexibility: Opencode's prompt is designed to enforce specific behaviors and safety constraints, while Pi's prompt is designed to enable assistance within a specific harness.
Workflow Integration: Opencode's strategy is better suited for complex, multi-step engineering tasks requiring strict adherence to conventions (as defined in AGENTS.md), whereas Pi is better suited for faster, more iterative interaction.

4. Context Composition Visualization

Average context window usage per trace against the Kimi K2.5 128k token window.
Each block = 1% ≈ 1,280 tokens · █ system prompt · ▓ messages + tool calls · ░ free space

Opencode  ·  365 traces  ·  avg 61,866 input tokens  ·  48.4% of context window
─────────────────────────────────────────────────────────────────────────────────
█ █ █ █ ▓ ▓ ▓ ▓ ▓ ▓    System prompt   4,677 est. tokens    3.6%
▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓    Messages+tools 57,189 tokens        44.7%
▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓    Free           66,134 tokens        51.7%
▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓
▓ ▓ ▓ ▓ ▓ ▓ ▓ ▓ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░

Pi  ·  49 traces  ·  avg 7,603 input tokens  ·  5.9% of context window
─────────────────────────────────────────────────────────────────────────
█ ▓ ▓ ▓ ▓ ▓ ░ ░ ░ ░    System prompt   1,367 est. tokens    1.1%
░ ░ ░ ░ ░ ░ ░ ░ ░ ░    Messages+tools  6,236 tokens          4.9%
░ ░ ░ ░ ░ ░ ░ ░ ░ ░    Free          120,397 tokens         94.1%
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░

Side-by-side summary

Metric	Opencode	Pi	Ratio
Traces analyzed	365	49	—
Avg input tokens	61,866	7,603	8.1× more
Avg system prompt tokens (est.)	4,677	1,367	3.4× more
Avg messages in context	101.6	18.6	5.5× more
Avg tool calls per trace	51.7	8.3	6.2× more
Context window used	48.4%	5.9%	8.2× more

System prompt tokens estimated as len(system_message_content) / 4. Agent identity derived from custom_metadata["trace.openrouter.api_key_name"]: opencode-202604 vs pi-20260422.

5. System Prompt Section Breakdown

Each agent's system prompt consists of a static harness (shipped with the agent) and injected project context (read from AGENTS.md / project config at runtime).

System Prompt Composition  ·  each character = 1% of that agent's total sys prompt  ·  left→right = prompt order
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
OPENCODE  4,638 est. tok  ╠IITTTTTTTTTTTTTCCCCCCCCCCCRRRRRRWWWWPPPPPPPUUUUMMMMMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA╣
Pi        1,367 est. tok  ╠OOOLLLLLLGGGGGGGGGGGGGGDDDDDDDDDDDDDDDDDDDDJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ╣

Opencode sections  ·  in prompt order  (51.6% static harness → then injected AGENTS.md)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 I  Intro / role                2%     94 tok  ██
 T  Tool Use Protocol          13%    598 tok  █████████████
 C  Coding Guidelines          11%    499 tok  ███████████
 R  Research Protocols          6%    281 tok  ██████
 W  Working Environment         4%    173 tok  ████
 P  Project Information         7%    334 tok  ███████
 U  Ultimate Reminders          4%    192 tok  ████
 M  Model/env metadata          5%    221 tok  █████
 A  Injected AGENTS.md         48%  2,247 tok  ████████████████████████████████████████████████

Pi sections  ·  in prompt order  (42.9% static harness → then injected project context)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 O  Role definition             3%     43 tok  ███
 L  Tool listing                6%     85 tok  ██████
 G  Guidelines                 14%    188 tok  ██████████████
 D  Pi documentation pointers  20%    271 tok  ████████████████████
 J  Injected project context   57%    781 tok  █████████████████████████████████████████████████████████

Measured on a median-length trace per agent.

Opencode (median trace · 4,638 est. tokens)

Section	Est. Tokens	% of Sys Prompt
Intro / role	94	2.0%
Tool Use Protocol	598	12.9%
Coding Guidelines	499	10.7%
Research Protocols	281	6.1%
Working Environment	173	3.7%
Project Information (static)	334	7.2%
Ultimate Reminders	192	4.1%
Model / env metadata	221	4.8%
Static harness subtotal	2,391	51.6%
Injected AGENTS.md content	2,247	48.4%
Total	4,638	100%

Across all 365 Opencode traces: injected AGENTS.md averages 2,096 est. tokens; static harness averages 2,627 est. tokens (varies because env metadata — skills, working dir — differs per project). 97.8% of traces had injected content; 90% fall in the 6–10k char band.

Ultimate Reminders (verbatim) — the closing section of Opencode's static harness, before the injected AGENTS.md:

At any time, you should be HELPFUL, CONCISE, and ACCURATE. Be thorough in your actions — test what you build, verify what you change — not in your explanations.

Never diverge from the requirements and the goals of the task you work on. Stay on track.

Never give the user more than what they want.

Try your best to avoid any hallucination. Do fact checking before providing any factual information.

Think about the best approach, then take action decisively.

Do not give up too early.

ALWAYS, keep it stupidly simple. Do not overcomplicate things.

When the task requires creating or modifying files, always use tools to do so. Never treat displaying code in your response as a substitute for actually writing it to the file system.

Pi has no equivalent section — it ends its static harness at the documentation pointers and trusts the model to supply these behavioral defaults.

Pi (median trace · 1,367 est. tokens)

Section	Est. Tokens	% of Sys Prompt
Role definition	43	3.1%
Tool listing	85	6.2%
Guidelines	188	13.8%
Pi documentation pointers	271	19.8%
Static harness subtotal	587	42.9%
Injected project context	781	57.1%
Total	1,367	100%

Pi's static harness is nearly identical across all 49 traces (5,467–5,480 chars), confirming a fully static prompt. Opencode's varies considerably (2,096–26,698 chars) due to dynamic env metadata and variable AGENTS.md size.

Harness comparison

	Opencode	Pi	Ratio
Static harness (est. tokens)	2,391	587	4.1× more
Injected project context (est. tokens)	2,247	781	2.9× more
Total system prompt	4,638	1,367	3.4× more

Opencode's static harness is 4× larger than Pi's even before project context — driven by its detailed procedural sections (Tool Use Protocol, Coding Guidelines, Research Protocols, Safety/Environment). Pi relies on the model's internalized knowledge instead.

How project context works in both agents

Both agents use the same mechanism: they read AGENTS.md files from the project directory at startup and inject the content into the system prompt. The difference is purely structural:

	Opencode	Pi
Boundary marker	`Instructions from: /path/to/AGENTS.md` (flat text)	`## /path/to/AGENTS.md` heading under `# Project Context`
Dynamic fields injected	Working directory, date, skills list	Working directory, date
Content source	Same `AGENTS.md` files	Same `AGENTS.md` files

Pi's presentation is slightly more readable (path as a Markdown ## heading so content flows naturally below it), while Opencode uses a flat text boundary marker. The injected content itself is identical in principle — whatever is in the project's AGENTS.md.

Why Pi's tool listing costs so much less

Pi's entire tool section (85 est. tokens) is four bullet points:

Available tools:
- read: Read file contents
- bash: Execute bash commands (ls, grep, find, etc.)
- edit: Make precise file edits with exact text replacement
- write: Create or overwrite files

Opencode's Tool Use Protocol (598 est. tokens) is a full procedural manual: when to call tools, how to interpret results, instructions to parallelize calls, what to do after tool results return, and how to use the task subtask delegation tool.

The same pattern applies across every section: Opencode encodes expected behavior explicitly in the prompt; Pi trusts the model's training to supply it. This is the root cause of the 4.1× static harness size difference — not a difference in capability, but a difference in where the behavior contract lives: in the prompt vs. in the model weights.

6. Message Breakdown

Token counts are estimated (chars/4). Tool outputs are returned as user messages in both agents' trace formats, so user message tokens are high — they accumulate tool results across the session.

Average messages per trace

Role	Opencode avg/trace	Pi avg/trace	Ratio
system	1.0	1.0	1.0×
user	58.0	10.2	5.7× more
assistant	42.6	7.4	5.8× more
Total	101.6	18.6	5.5× more

Median messages/trace: Opencode 54, Pi 14 (P90: Opencode 282, Pi 39 — Opencode has a heavy right tail from long agentic runs).

Average tokens per message

Role	Opencode tok/msg	Pi tok/msg	Opencode % of total	Pi % of total
system	4,677	1,367	10.1%	22.8%
user	523	301	65.7%	51.1%
assistant	262	212	24.1%	26.1%

User messages dominate total token usage in both agents because tool outputs accumulate there across the full session. Despite Opencode's larger system prompt, the bulk of its token cost comes from a longer conversation history.

Assistant message types

Type	Opencode avg/trace	Pi avg/trace
Tool calls	33.1 (77.6%)	5.6 (75.3%)
Plain text responses	9.6 (22.4%)	1.8 (24.7%)

~77% of assistant messages are tool invocations in both agents — the ratio is nearly identical, suggesting similar interaction patterns. The difference is session depth, not interaction style.

Opencode top tool calls (365 traces)

Tool	Total calls	Avg/trace
bash	5,600	15.3
read	3,074	8.4
edit	1,471	4.0
write	1,092	3.0
glob	453	1.2

dat-boris/compare-opencode-vs-pi.md

Select an option

No results found