Skip to content

Instantly share code, notes, and snippets.

@nibzard
Created March 20, 2026 18:18
Show Gist options
  • Select an option

  • Save nibzard/ea38f88fa56d5226064a6c7fc07e20f0 to your computer and use it in GitHub Desktop.

Select an option

Save nibzard/ea38f88fa56d5226064a6c7fc07e20f0 to your computer and use it in GitHub Desktop.
Z.ai/GLM models produce garbled output with thinking parameter enabled

Z.ai/GLM models produce garbled output with thinking parameter enabled

Description

When using z.ai endpoints with GLM 4.7 or GLM 5 models, the LLM sometimes returns garbled outputs—random tokens that are unusable. The same endpoints and LLM work correctly in other harnesses like CloudCode CLI, suggesting the issue is specific to OpenCode's handling of these models.

Root Cause Analysis

After investigation, the issue appears to be related to the thinking parameter configuration in transform.ts:

// packages/opencode/src/provider/transform.ts:746-751
if (["zai", "zhipuai"].includes(input.model.providerID) && input.model.api.npm === "@ai-sdk/openai-compatible") {
  result["thinking"] = {
    type: "enabled",
    clear_thinking: false,
  }
}

Potential Issues

  1. clear_thinking: false may cause instability - This setting tells the API not to clear thinking content, but may interfere with the model's token generation, causing garbled output.

  2. Inconsistent interleaved capability configuration - Only glm-4.7 has "interleaved": { "field": "reasoning_content" } configured in the model definitions, while other reasoning-capable GLM models (glm-4.5-flash, glm-4.5, glm-4.7-flash, glm-4.5-air, glm-4.6, glm-4.6v) are missing this capability. This means reasoning content in assistant messages won't be properly serialized for multi-turn conversations.

  3. Parameter format differs from similar providers - Other providers use simpler formats:

    • alibaba-cn: { enable_thinking: true }
    • baseten: { chat_template_args: { enable_thinking: true } }

Steps to Reproduce

  1. Configure z.ai provider with GLM 4.7 or GLM 5 model
  2. Start a conversation with the model
  3. After several turns (especially with reasoning content), observe garbled/random tokens in the output

Expected Behavior

GLM models should produce coherent output consistent with behavior in other clients (CloudCode CLI, etc.)

Proposed Solutions

  1. Simplify the thinking parameter - Try removing clear_thinking: false:

    result["thinking"] = {
      type: "enabled",
    }
  2. Add interleaved capability to all reasoning GLM models - Update model definitions to include:

    "interleaved": { "field": "reasoning_content" }
  3. Consider testing without the thinking parameter - Temporarily disable to confirm this is the cause

Environment

  • OpenCode version: current dev branch
  • Affected models: GLM 4.7, GLM 5 (and potentially other z.ai reasoning models)
  • Provider: z.ai / zhipuai

Related Files

  • packages/opencode/src/provider/transform.ts (thinking parameter configuration)
  • packages/opencode/test/tool/fixtures/models-api.json (model capabilities)

Additional Context

Investigation also covered:

  • The zhipuai Python SDK (v2.1.5) which shows the API accepts a thinking parameter but doesn't define a specific schema
  • The AI SDK's openai-compatible provider which correctly handles reasoning_content in streaming responses
  • The flow of providerOptions through the AI SDK which correctly passes the thinking parameter to the API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment