Syntax Errors

Definition: "I can't parse what you wrote" - violations of the language's structural rules that prevent parsing. These are grammatical/structural errors independent of tool definition or user prompt.

Characteristics:

Structure is fundamentally broken
Parser cannot extract function calls
Independent of tool definitions

MalformedFunctionCall

Any malformed attempt at function calling where the parser can identify an attempt was made but the structure is invalid.

Examples:

<!-- Missing closing tag -->
<function_calls>
<invoke name="get_current_weather">
<parameter name="location">["San Francisco", "CA"]</parameter>
<!-- Missing </invoke> -->
</function_calls>

<!-- Missing closing function_calls tag -->
<function_calls>
<invoke name="get_current_weather">
<parameter name="location">["San Francisco", "CA"]</parameter>
</invoke>
<!-- Missing </function_calls> -->

Current implementation mapping:

BadFunctionCalls → Penalty: FCUReactorInvalidFunctionCall (parser found valid opening <function_calls> tag but content is malformed) ( None → Penalty: FCUReactorUnparseableFunctionCall (parser cannot find valid opening <function_calls> tag - missing, corrupted, or namespace mismatch)

EmptyFunctionCall

Valid tags but no invocations inside.

Example:

<function_calls>
<!-- No invoke tags inside -->
</function_calls>

Current implementation mapping: [] (empty list) → NO PENALTY (BUG!)

InvalidParameterSyntax

Cannot parse parameter values due to malformed JSON or invalid characters.

Example:

<function_calls>
<invoke name="get_current_weather">
<parameter name="location">{invalid json}</parameter>
</invoke>
</function_calls>

Current implementation mapping: BadArgumentOrName + "not valid JSON" → Penalty: FCUReactorInvalidToolName

TokenizationError

Low-level tokenization failures.

What it is: Occurs when the model generates a function call that starts correctly (valid <function_calls> tag) but the final token is malformed due to incorrect understanding of token boundaries. Most commonly seen with the J49 tokenizer when the closing > character is not properly tokenized.

Why it's heavily penalized (-2.0): This is a fundamental model training issue where the model doesn't understand how the tokenizer splits text. It needs to learn proper token boundaries to generate parseable output. The high penalty helps the model learn this critical skill.

Current implementation mapping: Tokenization error → Penalty: FCUReactorTokenizationError

TurtleShip/syntax_error.md

Syntax Errors

MalformedFunctionCall

EmptyFunctionCall

InvalidParameterSyntax

TokenizationError