Skip to content

Instantly share code, notes, and snippets.

@TurtleShip
Created July 31, 2025 19:38
Show Gist options
  • Save TurtleShip/87ab343a547eec4f3d87abe6a3bb4e61 to your computer and use it in GitHub Desktop.
Save TurtleShip/87ab343a547eec4f3d87abe6a3bb4e61 to your computer and use it in GitHub Desktop.
Syntax error explanation

Syntax Errors

Definition: "I can't parse what you wrote" - violations of the language's structural rules that prevent parsing. These are grammatical/structural errors independent of tool definition or user prompt.

Characteristics:

  1. Structure is fundamentally broken
  2. Parser cannot extract function calls
  3. Independent of tool definitions

MalformedFunctionCall

Any malformed attempt at function calling where the parser can identify an attempt was made but the structure is invalid.

Examples:

<!-- Missing closing tag -->
<function_calls>
<invoke name="get_current_weather">
<parameter name="location">["San Francisco", "CA"]</parameter>
<!-- Missing </invoke> -->
</function_calls>

<!-- Missing closing function_calls tag -->
<function_calls>
<invoke name="get_current_weather">
<parameter name="location">["San Francisco", "CA"]</parameter>
</invoke>
<!-- Missing </function_calls> -->

Current implementation mapping:

  • BadFunctionCalls → Penalty: FCUReactorInvalidFunctionCall (parser found valid opening <function_calls> tag but content is malformed) ( None → Penalty: FCUReactorUnparseableFunctionCall (parser cannot find valid opening <function_calls> tag - missing, corrupted, or namespace mismatch)

EmptyFunctionCall

Valid tags but no invocations inside.

Example:

<function_calls>
<!-- No invoke tags inside -->
</function_calls>

Current implementation mapping: [] (empty list) → NO PENALTY (BUG!)

InvalidParameterSyntax

Cannot parse parameter values due to malformed JSON or invalid characters.

Example:

<function_calls>
<invoke name="get_current_weather">
<parameter name="location">{invalid json}</parameter>
</invoke>
</function_calls>

Current implementation mapping: BadArgumentOrName + "not valid JSON" → Penalty: FCUReactorInvalidToolName

TokenizationError

Low-level tokenization failures.

What it is: Occurs when the model generates a function call that starts correctly (valid <function_calls> tag) but the final token is malformed due to incorrect understanding of token boundaries. Most commonly seen with the J49 tokenizer when the closing > character is not properly tokenized.

Why it's heavily penalized (-2.0): This is a fundamental model training issue where the model doesn't understand how the tokenizer splits text. It needs to learn proper token boundaries to generate parseable output. The high penalty helps the model learn this critical skill.

Current implementation mapping: Tokenization error → Penalty: FCUReactorTokenizationError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment