Date: 2026-05-12 Purpose: Design an optimal system prompt for GLM-5.1 in opencode (a coding agent CLI), informed by the GLM-5 paper, Z.AI docs, and community findings.
| Source | Key Takeaway |
|---|---|
| GLM-5 Paper (arXiv 2602.15763) | Model architecture, training data, thinking modes, agentic RL |
| Z.AI Context Caching Docs | Implicit prefix-based caching, stable system prompts recommended |
| Z.AI GLM-5 Overview | Model specs: 200K context, 128K output, agentic engineering focus |
| Hiveloop GLM System Prompt Guide | OpenAI-compatible API, auto-injected tool descriptions, no official prompt guide from Zhipu |
| GLM-AI.chat Coding Guide | Practical prompting principles, reusable coding system prompt |
From the paper (Section 3.1 — Supervised Fine-Tuning):
- Response style optimized for conciseness and logic compared to GLM-4.5. Verbose prompts may fight the training.
- Trained on real agentic trajectories — tool-calling sessions, coding agent workflows, search agent data. The model is tuned for multi-step tool use and error correction.
- Three thinking modes: Interleaved Thinking (think before every response/tool call), Preserved Thinking (retain thinking blocks across turns), Turn-level Thinking (per-turn control).
- Erroneous trajectory segments retained but masked in loss — model learns error correction without reinforcing bad actions.
From the caching docs:
- Implicit/automatic — no manual cache management. Matches identical content from the start of the message list.
- Prefix-based — system prompt must be identical across requests for cache hits.
- Best practice: "Use stable system prompts" — explicitly recommended.
- Cached tokens reported in
usage.prompt_tokens_details.cached_tokens. - Works across all GLM models (GLM-5, GLM-4.7, GLM-4.6, GLM-4.5).
From Hiveloop guide:
- OpenAI-compatible API at
https://api.z.ai/api/paas/v4/chat/completions - Standard
system/user/assistantroles. Also supports uniqueobservationrole for tool results. - Auto-injects tool descriptions before your system prompt — tool schema is formatted into an internal system prompt that precedes your custom one. Your system prompt comes after.
- No official prompt engineering guide from Zhipu — OpenAI prompt principles generally apply.
From GLM-AI.chat and community sources:
- Rule-list style works best — concise, imperative rules. Not essays.
- State language, version, runtime explicitly for code tasks.
- Ask for plans first on non-trivial tasks, then implement.
- Constrain output format — say whether you want a diff, a file, or a function.
- Iterate, don't restart — the 200K context window is built for multi-turn refinement.
- GLM-5 approaches Claude Opus 4.5 on SWE-bench Verified (77.8%) and real-world coding tasks.
- Strong on repo exploration (65.6% Pass@1 vs Opus 4.5's 64.5%) — strategic search over raw code generation.
- 100% build success rate on React/Vue/Svelte — excellent at generating syntactically correct code.
- Weaker on long-horizon chained tasks (52.3% vs Opus 4.5's 61.6%) — errors compound across steps.
- Keep it short (~100-160 tokens). The model is trained for concise output. Verbose instructions fight the training and waste cached tokens.
- Rule-list, not prose. Imperative statements. Numbered priorities for conflict resolution.
- Trust the training. Don't instruct the model on things it was specifically trained for (tool calling, code generation, conciseness). Only add behavioral rules that differ from default behavior.
- Optimize for cache stability. Use a single shared base prompt across all agents. Differentiate via small user-message suffixes, not different system prompts.
- Don't duplicate what's injected elsewhere. Environment info (model, cwd, date) is injected by opencode. Tool descriptions are auto-injected by GLM. Don't repeat these in the system prompt.
You are opencode, a coding agent.
# Rules (priority order)
Earlier rules win when they conflict.
1. Never commit unless explicitly asked. Never push unless explicitly asked.
2. Never make claims without evidence. If you don't know, say so. Label inferences as such.
3. Never introduce secrets or credentials into code, logs, or commits.
4. Be concise. 1-4 lines unless detail is requested. One-word answers when sufficient.
5. Be direct. Answer the question asked. No preamble, no postamble, no unsolicited summaries.
# Reasoning
Distinguish questions from requests. Questions get answers. Requests get action.
When investigating, share relevant context — not just conclusions.
Admit uncertainty immediately. "I don't know" beats a confident wrong answer.
# Code changes
Understand existing conventions before editing. Read surrounding code, imports, and neighboring files.
Mimic existing style. Use the same libraries and patterns. Never assume a library is available — verify it's used.
Make minimal changes. Don't refactor surrounding code unless asked.
After completing a task, run lint and typecheck. If unknown, ask the user and suggest adding to AGENTS.md.
When running a non-trivial bash command that changes the system, briefly state what it does and why.~160 tokens.
Before (current setup): default.txt (105 lines, ~2000 tokens) as the system prompt. Switching agents = full cache miss on system prompt.
After (proposed): Single prompt.md (~160 tokens) used as the system prompt for all agents. Agent differences injected as user-message <system-reminder> suffixes (~50-100 tokens each). Switching agents = only the small suffix changes, base prompt stays cached.
The original prompts enumerated available tools (Glob, Grep, Read, Bash). This conflicts with two findings:
- GLM auto-injects tool descriptions — the model already sees full tool schemas. Enumerating them in the prompt is redundant.
- MCP and plugins add tools the prompts don't mention — in my case the
opencode-codebase-indexplugin addscodebase_peek,codebase_search,index_codebase; thedocforkMCP addsdocfork_search_docs/docfork_fetch_doc; thesearxngMCP addssearxng_web_search/searxng_web_url_read. The original prompts couldn't guide usage of these tools, leading to bad tool compliance.
The revised prompts use a preference hierarchy instead of an enumeration — telling the model which tool to prefer for which job, without duplicating schema information.
You are a file search specialist. You find files and navigate codebases efficiently.
Tool preference (use earlier tools first):
1. codebase_peek — locate code by meaning, returns metadata only (fastest)
2. codebase_search — semantic search with full code content
3. grep — exact pattern search
4. glob — file name patterns
5. read — inspect specific files
Run index_codebase before using codebase commands if the repo hasn't been indexed yet.
Use bash only for read-only operations. Never create or modify files.
Return absolute paths. Adapt thoroughness to the caller's specified level (quick / medium / very thorough).Changes from original:
- Added
codebase_peek,codebase_search,index_codebaseto preference hierarchy index_codebasementioned as conditional prerequisite, not in the main hierarchy (it's incremental ~50ms if already indexed)- Dropped tool enumeration — model sees schemas from auto-injection
- Dropped emoji rule (GLM training handles this)
- ~50% shorter
You are `scout`, a read-only research agent for external libraries, documentation, and dependency source.
For documentation and web research:
1. docfork_search_docs → docfork_fetch_doc — indexed library documentation
2. searxng_web_search → webfetch — general web search
For dependency source code:
1. repo_clone → grep / glob / read — inspect cloned repositories
For the current workspace only:
1. codebase_peek / codebase_search — semantic search (requires index)
Research standards:
- Cite exact file paths and line references
- Separate verified findings from inferences
- Note branch state if reading a cloned repo
- Say explicitly if a source is inaccessible
Output: direct answer first, then evidence. Keep it scannable. Absolute paths only.
Never modify the user's workspace.Changes from original:
- Added
docfork_search_docs/docfork_fetch_docandsearxng_web_searchto preference hierarchy codebase_peek/codebase_searchscoped to current workspace only (semantic index is repo-local)- Organized by concern (docs/web → dependency source → current workspace) instead of flat tool list
- Dropped tool enumeration and working style instructions that duplicated tool schemas
- ~50% shorter
| Tool | Source | Tools Provided |
|---|---|---|
| opencode-codebase-index | https://github.com/Helweg/opencode-codebase-index | codebase_peek, codebase_search, index_codebase, index_status, index_health_check, index_logs, index_metrics |
| docfork | https://mcp.docfork.com | docfork_search_docs, docfork_fetch_doc |
| searxng | https://github.com/searxng/searxng | searxng_web_search, searxng_web_url_read |
For maximum cache hits, all agents should use the same base system prompt (glm5.1-prompt.md). Agents differentiate via user-message suffixes (plan, build-switch) or agent-specific prompts that are appended, not replacing.
| Agent | Prompt | Notes |
|---|---|---|
| build | prompt.md |
Shared base prompt |
| plan | prompt.md |
Shared base prompt + plan.txt user-message suffix |
| general | prompt.md |
Shared base prompt — no custom text needed, generic subagent |
| explore | explore.md |
Custom prompt (tool preference hierarchy) |
| scout | scout.md |
Custom prompt (tool preference hierarchy) |
| compaction | built-in | Hidden |
| title | built-in | Hidden |
| summary | built-in | Hidden |
Explore and scout use different prompts because they're specialized subagents with specific tool preference hierarchies that diverge significantly from the base. The cache tradeoff is acceptable — they're spawned as subagents, not the primary session agent.