When an assistant message is aborted mid-stream (e.g., after context compaction at 200k tokens), the toolCall blocks with partialJson get persisted to the session JSONL. OpenClaw's repair mechanism inserts synthetic tool_result entries for these. But at API call time, transformMessages() drops the aborted assistant entirely (stopReason === "aborted") while keeping the synthetic tool_result — creating orphaned results that make every subsequent API call fail with unexpected tool_use_id.
Stream abort (pi-ai/anthropic.js)
→ partialJson left on toolCall block, not cleaned up
→ Aborted assistant persisted to JSONL (agent-session.js)
→ Synthetic tool_result inserted for orphaned call (extensionAPI.js guard)
→ repairToolUseResultPairing sees it as "valid pairing" on reload
→ transformMessages DROPS aborted assistant (stopReason === "aborted")
→ But KEEPS the synthetic tool_result → orphaned → API 400
The mismatch: persistence-time treats the pair as valid, API-call-time drops one half. Nobody reconciles.
File: node_modules/@mariozechner/pi-ai/dist/providers/transform-messages.js
Current code (simplified):
// Second pass: handle aborted/errored messages
for (let i = 0; i < transformed.length; i++) {
const msg = transformed[i];
if (msg.role === "assistant") {
const assistantMsg = msg;
if (assistantMsg.stopReason === "error" || assistantMsg.stopReason === "aborted") {
continue; // ← DROPS assistant but forgets about its tool_use_ids
}
// ... track tool calls
result.push(msg);
}
// ... handle other roles (toolResult passes through unchecked)
}Proposed patch:
// Second pass: handle aborted/errored messages
const droppedToolCallIds = new Set(); // ← NEW
for (let i = 0; i < transformed.length; i++) {
const msg = transformed[i];
if (msg.role === "assistant") {
const assistantMsg = msg;
if (assistantMsg.stopReason === "error" || assistantMsg.stopReason === "aborted") {
// Collect tool_use_ids from the dropped message
for (const block of assistantMsg.content) { // ← NEW
if (block.type === "toolCall" && block.id) { // ← NEW
droppedToolCallIds.add(block.id); // ← NEW
} // ← NEW
} // ← NEW
continue;
}
// ... track tool calls
result.push(msg);
} else if (msg.role === "toolResult") {
// Skip tool results whose tool call was dropped (aborted/errored)
if (droppedToolCallIds.has(msg.toolCallId)) { // ← NEW
continue; // ← NEW
} // ← NEW
// ... existing toolResult handling
}
// ...
}Impact: ~8 lines added. Zero risk to existing behavior — only affects the case where an assistant message was already being dropped.
System crontab, every 5 minutes. Detects the toxic pattern in session JSONL files and auto-repairs.
#!/usr/bin/env bash
# ~/.openclaw/guardian.sh — Session integrity watchdog
# Install: crontab -e → */5 * * * * ~/.openclaw/guardian.sh >> /tmp/guardian.log 2>&1
set -euo pipefail
SESSIONS_DIR="$HOME/.openclaw/agents/main/sessions"
LOCK="/tmp/guardian.lock"
# Prevent concurrent runs
exec 9>"$LOCK"
flock -n 9 || exit 0
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
# Scan active session files for the toxic pattern:
# An assistant message with stopReason "aborted" + toolCall blocks,
# followed by a toolResult referencing the same toolCallId
for session_file in "$SESSIONS_DIR"/*.jsonl; do
[ -f "$session_file" ] || continue
# Quick grep: does this file have both "aborted" and "synthetic" markers?
if ! grep -q '"aborted"' "$session_file" 2>/dev/null; then
continue
fi
if ! grep -q 'synthetic' "$session_file" 2>/dev/null; then
continue
fi
log "WARN: Potential toxic pattern in $(basename "$session_file")"
# Extract tool_use_ids from aborted assistant messages
aborted_ids=$(python3 -c "
import json, sys
ids = set()
for line in open('$session_file'):
try:
msg = json.loads(line.strip())
if msg.get('role') == 'assistant' and msg.get('stopReason') == 'aborted':
for block in msg.get('content', []):
if block.get('type') == 'toolCall' and block.get('id'):
ids.add(block['id'])
except: pass
print(' '.join(ids))
" 2>/dev/null)
if [ -z "$aborted_ids" ]; then
continue
fi
log "Found aborted tool_use_ids: $aborted_ids"
# Check if there are orphaned toolResults for these IDs
needs_repair=false
for tid in $aborted_ids; do
if grep -q "\"toolCallId\":\"$tid\"" "$session_file"; then
needs_repair=true
break
fi
done
if [ "$needs_repair" = true ]; then
log "REPAIRING: Removing aborted assistant + orphaned toolResults"
# Backup
cp "$session_file" "${session_file}.bak.$(date +%s)"
# Remove lines containing aborted assistant messages and their orphaned toolResults
python3 -c "
import json
aborted_ids = set('$aborted_ids'.split())
lines = open('$session_file').readlines()
clean = []
removed = 0
for line in lines:
try:
msg = json.loads(line.strip())
# Drop aborted assistant messages
if msg.get('role') == 'assistant' and msg.get('stopReason') == 'aborted':
removed += 1
continue
# Drop toolResults referencing aborted tool calls
if msg.get('role') == 'toolResult' and msg.get('toolCallId') in aborted_ids:
removed += 1
continue
except:
pass
clean.append(line)
with open('$session_file', 'w') as f:
f.writelines(clean)
print(f'Removed {removed} corrupted entries')
"
log "Repair complete. Restarting gateway..."
export PATH="$HOME/.npm-global/bin:$PATH"
systemctl --user restart openclaw-gateway 2>/dev/null || true
log "Gateway restarted"
fi
done
log "Scan complete"File a GitHub issue on openclaw/openclaw with:
- The 6-layer analysis from Piotr's gist
- The
transform-messages.jspatch - A minimal reproduction case: force context overflow → abort during tool_use streaming → observe orphaned tool_result
This ensures the fix gets into the codebase properly so we don't need to re-patch after every openclaw update.
- Now: Apply
transform-messages.jspatch (prevents new corruption) - Now: If any active sessions are currently corrupted, run guardian repair manually
- Now: Install
guardian.shin system crontab (catches regressions) - Soon: File upstream issue with the patch + analysis
| Fix | Risk | Rollback |
|---|---|---|
| transform-messages.js patch | Minimal — only affects already-dropped messages | Revert file or npm install |
| guardian.sh | Minimal — only modifies files it backs up first | Restore from .bak files |
| Upstream issue | Zero — just documentation | N/A |
- The
partialJsonleak inpi-ai/anthropic.js(Bug 1) — ideallypartialJsonshould be cleaned up on abort too, but the transform-messages fix makes it harmless - Session files that are already corrupted need manual repair or guardian.sh