Skip to content

Instantly share code, notes, and snippets.

@piotrmasior
Created February 7, 2026 22:25
Show Gist options
  • Select an option

  • Save piotrmasior/7cd107dda4cba3b61fd2f8f69afe0665 to your computer and use it in GitHub Desktop.

Select an option

Save piotrmasior/7cd107dda4cba3b61fd2f8f69afe0665 to your computer and use it in GitHub Desktop.
Fix: Orphaned tool_result after aborted assistant messages — proposal

Fix: Orphaned tool_result After Aborted Assistant Messages

The Bug (TL;DR)

When an assistant message is aborted mid-stream (e.g., after context compaction at 200k tokens), the toolCall blocks with partialJson get persisted to the session JSONL. OpenClaw's repair mechanism inserts synthetic tool_result entries for these. But at API call time, transformMessages() drops the aborted assistant entirely (stopReason === "aborted") while keeping the synthetic tool_result — creating orphaned results that make every subsequent API call fail with unexpected tool_use_id.

Root Cause Chain

Stream abort (pi-ai/anthropic.js)
  → partialJson left on toolCall block, not cleaned up
  → Aborted assistant persisted to JSONL (agent-session.js)
  → Synthetic tool_result inserted for orphaned call (extensionAPI.js guard)
  → repairToolUseResultPairing sees it as "valid pairing" on reload
  → transformMessages DROPS aborted assistant (stopReason === "aborted")
  → But KEEPS the synthetic tool_result → orphaned → API 400

The mismatch: persistence-time treats the pair as valid, API-call-time drops one half. Nobody reconciles.

Fix 1: Patch transform-messages.js (the actual fix)

File: node_modules/@mariozechner/pi-ai/dist/providers/transform-messages.js

Current code (simplified):

// Second pass: handle aborted/errored messages
for (let i = 0; i < transformed.length; i++) {
    const msg = transformed[i];
    if (msg.role === "assistant") {
        const assistantMsg = msg;
        if (assistantMsg.stopReason === "error" || assistantMsg.stopReason === "aborted") {
            continue; // ← DROPS assistant but forgets about its tool_use_ids
        }
        // ... track tool calls
        result.push(msg);
    }
    // ... handle other roles (toolResult passes through unchecked)
}

Proposed patch:

// Second pass: handle aborted/errored messages
const droppedToolCallIds = new Set();  // ← NEW

for (let i = 0; i < transformed.length; i++) {
    const msg = transformed[i];
    if (msg.role === "assistant") {
        const assistantMsg = msg;
        if (assistantMsg.stopReason === "error" || assistantMsg.stopReason === "aborted") {
            // Collect tool_use_ids from the dropped message
            for (const block of assistantMsg.content) {        // ← NEW
                if (block.type === "toolCall" && block.id) {   // ← NEW
                    droppedToolCallIds.add(block.id);           // ← NEW
                }                                               // ← NEW
            }                                                   // ← NEW
            continue;
        }
        // ... track tool calls
        result.push(msg);
    } else if (msg.role === "toolResult") {
        // Skip tool results whose tool call was dropped (aborted/errored)
        if (droppedToolCallIds.has(msg.toolCallId)) {          // ← NEW
            continue;                                           // ← NEW
        }                                                       // ← NEW
        // ... existing toolResult handling
    }
    // ...
}

Impact: ~8 lines added. Zero risk to existing behavior — only affects the case where an assistant message was already being dropped.

Fix 2: guardian.sh — Session Watchdog (safety net)

System crontab, every 5 minutes. Detects the toxic pattern in session JSONL files and auto-repairs.

#!/usr/bin/env bash
# ~/.openclaw/guardian.sh — Session integrity watchdog
# Install: crontab -e → */5 * * * * ~/.openclaw/guardian.sh >> /tmp/guardian.log 2>&1

set -euo pipefail

SESSIONS_DIR="$HOME/.openclaw/agents/main/sessions"
LOCK="/tmp/guardian.lock"

# Prevent concurrent runs
exec 9>"$LOCK"
flock -n 9 || exit 0

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }

# Scan active session files for the toxic pattern:
# An assistant message with stopReason "aborted" + toolCall blocks,
# followed by a toolResult referencing the same toolCallId
for session_file in "$SESSIONS_DIR"/*.jsonl; do
    [ -f "$session_file" ] || continue
    
    # Quick grep: does this file have both "aborted" and "synthetic" markers?
    if ! grep -q '"aborted"' "$session_file" 2>/dev/null; then
        continue
    fi
    if ! grep -q 'synthetic' "$session_file" 2>/dev/null; then
        continue
    fi
    
    log "WARN: Potential toxic pattern in $(basename "$session_file")"
    
    # Extract tool_use_ids from aborted assistant messages
    aborted_ids=$(python3 -c "
import json, sys
ids = set()
for line in open('$session_file'):
    try:
        msg = json.loads(line.strip())
        if msg.get('role') == 'assistant' and msg.get('stopReason') == 'aborted':
            for block in msg.get('content', []):
                if block.get('type') == 'toolCall' and block.get('id'):
                    ids.add(block['id'])
    except: pass
print(' '.join(ids))
" 2>/dev/null)
    
    if [ -z "$aborted_ids" ]; then
        continue
    fi
    
    log "Found aborted tool_use_ids: $aborted_ids"
    
    # Check if there are orphaned toolResults for these IDs
    needs_repair=false
    for tid in $aborted_ids; do
        if grep -q "\"toolCallId\":\"$tid\"" "$session_file"; then
            needs_repair=true
            break
        fi
    done
    
    if [ "$needs_repair" = true ]; then
        log "REPAIRING: Removing aborted assistant + orphaned toolResults"
        
        # Backup
        cp "$session_file" "${session_file}.bak.$(date +%s)"
        
        # Remove lines containing aborted assistant messages and their orphaned toolResults
        python3 -c "
import json

aborted_ids = set('$aborted_ids'.split())
lines = open('$session_file').readlines()
clean = []
removed = 0

for line in lines:
    try:
        msg = json.loads(line.strip())
        # Drop aborted assistant messages
        if msg.get('role') == 'assistant' and msg.get('stopReason') == 'aborted':
            removed += 1
            continue
        # Drop toolResults referencing aborted tool calls
        if msg.get('role') == 'toolResult' and msg.get('toolCallId') in aborted_ids:
            removed += 1
            continue
    except:
        pass
    clean.append(line)

with open('$session_file', 'w') as f:
    f.writelines(clean)
print(f'Removed {removed} corrupted entries')
"
        
        log "Repair complete. Restarting gateway..."
        export PATH="$HOME/.npm-global/bin:$PATH"
        systemctl --user restart openclaw-gateway 2>/dev/null || true
        
        log "Gateway restarted"
    fi
done

log "Scan complete"

Fix 3: Upstream Issue (permanent fix)

File a GitHub issue on openclaw/openclaw with:

  • The 6-layer analysis from Piotr's gist
  • The transform-messages.js patch
  • A minimal reproduction case: force context overflow → abort during tool_use streaming → observe orphaned tool_result

This ensures the fix gets into the codebase properly so we don't need to re-patch after every openclaw update.

Deployment Order

  1. Now: Apply transform-messages.js patch (prevents new corruption)
  2. Now: If any active sessions are currently corrupted, run guardian repair manually
  3. Now: Install guardian.sh in system crontab (catches regressions)
  4. Soon: File upstream issue with the patch + analysis

Risk Assessment

Fix Risk Rollback
transform-messages.js patch Minimal — only affects already-dropped messages Revert file or npm install
guardian.sh Minimal — only modifies files it backs up first Restore from .bak files
Upstream issue Zero — just documentation N/A

What This Does NOT Fix

  • The partialJson leak in pi-ai/anthropic.js (Bug 1) — ideally partialJson should be cleaned up on abort too, but the transform-messages fix makes it harmless
  • Session files that are already corrupted need manual repair or guardian.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment