Skip to content

Instantly share code, notes, and snippets.

@apnea
Last active May 13, 2026 10:26
Show Gist options
  • Select an option

  • Save apnea/e9dd7a650bdc3300375fffc54592f48d to your computer and use it in GitHub Desktop.

Select an option

Save apnea/e9dd7a650bdc3300375fffc54592f48d to your computer and use it in GitHub Desktop.
GLM-5/5.1 System Prompt Research & Design for opencode

GLM-5/5.1 System Prompt Research & Design

Date: 2026-05-12 Purpose: Design an optimal system prompt for GLM-5.1 in opencode (a coding agent CLI), informed by the GLM-5 paper, Z.AI docs, and community findings.


Sources

Source Key Takeaway
GLM-5 Paper (arXiv 2602.15763) Model architecture, training data, thinking modes, agentic RL
Z.AI Context Caching Docs Implicit prefix-based caching, stable system prompts recommended
Z.AI GLM-5 Overview Model specs: 200K context, 128K output, agentic engineering focus
Hiveloop GLM System Prompt Guide OpenAI-compatible API, auto-injected tool descriptions, no official prompt guide from Zhipu
GLM-AI.chat Coding Guide Practical prompting principles, reusable coding system prompt

Key Findings

1. GLM-5 Training Optimizations

From the paper (Section 3.1 — Supervised Fine-Tuning):

  • Response style optimized for conciseness and logic compared to GLM-4.5. Verbose prompts may fight the training.
  • Trained on real agentic trajectories — tool-calling sessions, coding agent workflows, search agent data. The model is tuned for multi-step tool use and error correction.
  • Three thinking modes: Interleaved Thinking (think before every response/tool call), Preserved Thinking (retain thinking blocks across turns), Turn-level Thinking (per-turn control).
  • Erroneous trajectory segments retained but masked in loss — model learns error correction without reinforcing bad actions.

2. Context Caching (Z.AI)

From the caching docs:

  • Implicit/automatic — no manual cache management. Matches identical content from the start of the message list.
  • Prefix-based — system prompt must be identical across requests for cache hits.
  • Best practice: "Use stable system prompts" — explicitly recommended.
  • Cached tokens reported in usage.prompt_tokens_details.cached_tokens.
  • Works across all GLM models (GLM-5, GLM-4.7, GLM-4.6, GLM-4.5).

3. API & Prompt Format

From Hiveloop guide:

  • OpenAI-compatible API at https://api.z.ai/api/paas/v4/chat/completions
  • Standard system/user/assistant roles. Also supports unique observation role for tool results.
  • Auto-injects tool descriptions before your system prompt — tool schema is formatted into an internal system prompt that precedes your custom one. Your system prompt comes after.
  • No official prompt engineering guide from Zhipu — OpenAI prompt principles generally apply.

4. Effective Prompting Patterns

From GLM-AI.chat and community sources:

  • Rule-list style works best — concise, imperative rules. Not essays.
  • State language, version, runtime explicitly for code tasks.
  • Ask for plans first on non-trivial tasks, then implement.
  • Constrain output format — say whether you want a diff, a file, or a function.
  • Iterate, don't restart — the 200K context window is built for multi-turn refinement.

5. Agentic Capabilities (from paper Section 6.2)

  • GLM-5 approaches Claude Opus 4.5 on SWE-bench Verified (77.8%) and real-world coding tasks.
  • Strong on repo exploration (65.6% Pass@1 vs Opus 4.5's 64.5%) — strategic search over raw code generation.
  • 100% build success rate on React/Vue/Svelte — excellent at generating syntactically correct code.
  • Weaker on long-horizon chained tasks (52.3% vs Opus 4.5's 61.6%) — errors compound across steps.

Design Principles (Derived from Findings)

  1. Keep it short (~100-160 tokens). The model is trained for concise output. Verbose instructions fight the training and waste cached tokens.
  2. Rule-list, not prose. Imperative statements. Numbered priorities for conflict resolution.
  3. Trust the training. Don't instruct the model on things it was specifically trained for (tool calling, code generation, conciseness). Only add behavioral rules that differ from default behavior.
  4. Optimize for cache stability. Use a single shared base prompt across all agents. Differentiate via small user-message suffixes, not different system prompts.
  5. Don't duplicate what's injected elsewhere. Environment info (model, cwd, date) is injected by opencode. Tool descriptions are auto-injected by GLM. Don't repeat these in the system prompt.

The Prompt

You are opencode, a coding agent.

# Rules (priority order)

Earlier rules win when they conflict.

1. Never commit unless explicitly asked. Never push unless explicitly asked.
2. Never make claims without evidence. If you don't know, say so. Label inferences as such.
3. Never introduce secrets or credentials into code, logs, or commits.
4. Be concise. 1-4 lines unless detail is requested. One-word answers when sufficient.
5. Be direct. Answer the question asked. No preamble, no postamble, no unsolicited summaries.

# Reasoning

Distinguish questions from requests. Questions get answers. Requests get action.
When investigating, share relevant context — not just conclusions.
Admit uncertainty immediately. "I don't know" beats a confident wrong answer.

# Code changes

Understand existing conventions before editing. Read surrounding code, imports, and neighboring files.
Mimic existing style. Use the same libraries and patterns. Never assume a library is available — verify it's used.
Make minimal changes. Don't refactor surrounding code unless asked.
After completing a task, run lint and typecheck. If unknown, ask the user and suggest adding to AGENTS.md.
When running a non-trivial bash command that changes the system, briefly state what it does and why.

~160 tokens.


Cache Impact Analysis

Before (current setup): default.txt (105 lines, ~2000 tokens) as the system prompt. Switching agents = full cache miss on system prompt.

After (proposed): Single prompt.md (~160 tokens) used as the system prompt for all agents. Agent differences injected as user-message <system-reminder> suffixes (~50-100 tokens each). Switching agents = only the small suffix changes, base prompt stays cached.


Subagent Prompts: Explore & Scout

Design Rationale

The original prompts enumerated available tools (Glob, Grep, Read, Bash). This conflicts with two findings:

  1. GLM auto-injects tool descriptions — the model already sees full tool schemas. Enumerating them in the prompt is redundant.
  2. MCP and plugins add tools the prompts don't mention — in my case the opencode-codebase-index plugin adds codebase_peek, codebase_search, index_codebase; the docfork MCP adds docfork_search_docs/docfork_fetch_doc; the searxng MCP adds searxng_web_search/searxng_web_url_read. The original prompts couldn't guide usage of these tools, leading to bad tool compliance.

The revised prompts use a preference hierarchy instead of an enumeration — telling the model which tool to prefer for which job, without duplicating schema information.

Explore Agent (explore.md)

You are a file search specialist. You find files and navigate codebases efficiently.

Tool preference (use earlier tools first):
1. codebase_peek — locate code by meaning, returns metadata only (fastest)
2. codebase_search — semantic search with full code content
3. grep — exact pattern search
4. glob — file name patterns
5. read — inspect specific files

Run index_codebase before using codebase commands if the repo hasn't been indexed yet.

Use bash only for read-only operations. Never create or modify files.

Return absolute paths. Adapt thoroughness to the caller's specified level (quick / medium / very thorough).

Changes from original:

  • Added codebase_peek, codebase_search, index_codebase to preference hierarchy
  • index_codebase mentioned as conditional prerequisite, not in the main hierarchy (it's incremental ~50ms if already indexed)
  • Dropped tool enumeration — model sees schemas from auto-injection
  • Dropped emoji rule (GLM training handles this)
  • ~50% shorter

Scout Agent (scout.md)

You are `scout`, a read-only research agent for external libraries, documentation, and dependency source.

For documentation and web research:
1. docfork_search_docs → docfork_fetch_doc — indexed library documentation
2. searxng_web_search → webfetch — general web search

For dependency source code:
1. repo_clone → grep / glob / read — inspect cloned repositories

For the current workspace only:
1. codebase_peek / codebase_search — semantic search (requires index)

Research standards:
- Cite exact file paths and line references
- Separate verified findings from inferences
- Note branch state if reading a cloned repo
- Say explicitly if a source is inaccessible

Output: direct answer first, then evidence. Keep it scannable. Absolute paths only.
Never modify the user's workspace.

Changes from original:

  • Added docfork_search_docs/docfork_fetch_doc and searxng_web_search to preference hierarchy
  • codebase_peek/codebase_search scoped to current workspace only (semantic index is repo-local)
  • Organized by concern (docs/web → dependency source → current workspace) instead of flat tool list
  • Dropped tool enumeration and working style instructions that duplicated tool schemas
  • ~50% shorter

Plugin & MCP References

Tool Source Tools Provided
opencode-codebase-index https://github.com/Helweg/opencode-codebase-index codebase_peek, codebase_search, index_codebase, index_status, index_health_check, index_logs, index_metrics
docfork https://mcp.docfork.com docfork_search_docs, docfork_fetch_doc
searxng https://github.com/searxng/searxng searxng_web_search, searxng_web_url_read

Cache Consistency: All Agents

For maximum cache hits, all agents should use the same base system prompt (glm5.1-prompt.md). Agents differentiate via user-message suffixes (plan, build-switch) or agent-specific prompts that are appended, not replacing.

Agent Prompt Notes
build prompt.md Shared base prompt
plan prompt.md Shared base prompt + plan.txt user-message suffix
general prompt.md Shared base prompt — no custom text needed, generic subagent
explore explore.md Custom prompt (tool preference hierarchy)
scout scout.md Custom prompt (tool preference hierarchy)
compaction built-in Hidden
title built-in Hidden
summary built-in Hidden

Explore and scout use different prompts because they're specialized subagents with specific tool preference hierarchies that diverge significantly from the base. The cache tradeoff is acceptable — they're spawned as subagents, not the primary session agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment