Definition: "I can't parse what you wrote" - violations of the language's structural rules that prevent parsing. These are grammatical/structural errors independent of tool definition or user prompt.
Characteristics:
- Structure is fundamentally broken
- Parser cannot extract function calls
- Independent of tool definitions
Any malformed attempt at function calling where the parser can identify an attempt was made but the structure is invalid.
Examples:
<!-- Missing closing tag -->
<function_calls>
<invoke name="get_current_weather">
<parameter name="location">["San Francisco", "CA"]</parameter>
<!-- Missing </invoke> -->
</function_calls>
<!-- Missing closing function_calls tag -->
<function_calls>
<invoke name="get_current_weather">
<parameter name="location">["San Francisco", "CA"]</parameter>
</invoke>
<!-- Missing </function_calls> -->
Current implementation mapping:
- BadFunctionCalls → Penalty: FCUReactorInvalidFunctionCall (parser found valid opening <function_calls> tag but content is malformed) ( None → Penalty: FCUReactorUnparseableFunctionCall (parser cannot find valid opening <function_calls> tag - missing, corrupted, or namespace mismatch)
Valid tags but no invocations inside.
Example:
<function_calls>
<!-- No invoke tags inside -->
</function_calls>
Current implementation mapping: [] (empty list) → NO PENALTY (BUG!)
Cannot parse parameter values due to malformed JSON or invalid characters.
Example:
<function_calls>
<invoke name="get_current_weather">
<parameter name="location">{invalid json}</parameter>
</invoke>
</function_calls>
Current implementation mapping: BadArgumentOrName + "not valid JSON" → Penalty: FCUReactorInvalidToolName
Low-level tokenization failures.
What it is: Occurs when the model generates a function call that starts correctly (valid <function_calls> tag) but the final token is malformed due to incorrect understanding of token boundaries. Most commonly seen with the J49 tokenizer when the closing > character is not properly tokenized.
Why it's heavily penalized (-2.0): This is a fundamental model training issue where the model doesn't understand how the tokenizer splits text. It needs to learn proper token boundaries to generate parseable output. The high penalty helps the model learn this critical skill.
Current implementation mapping: Tokenization error → Penalty: FCUReactorTokenizationError