TOON · GitHub

TOON, or Token-Oriented Object Notation, is a lightweight, human-readable data serialization format specifically designed for use with Large Language Models (LLMs). It acts as a more efficient alternative to JSON by reducing token consumption in prompts, making it ideal for passing structured data to AI systems without losing information. TOON is particularly effective for uniform tabular data, such as arrays of objects with consistent fields, where it can achieve 30-60% fewer tokens compared to JSON, based on benchmarks using common tokenizers like those in GPT models. This efficiency comes from stripping away redundant syntax like braces, brackets, and repeated keys, while relying on indentation and length markers to maintain structure.

Purpose and Benefits

The main goal of TOON is to optimize for LLM contexts, where token limits and costs are critical. JSON's verbosity can inflate prompts unnecessarily, especially with large datasets, but TOON minimizes this by:

Using indentation for nesting (similar to YAML).
Employing a tabular row format for arrays (inspired by CSV).
Including explicit length markers for arrays to aid parsing.
Supporting optional delimiters like commas, tabs, or pipes to further reduce tokenization overhead.

Benchmarks show significant savings: for example, a dataset of GitHub repositories uses 42.3% fewer tokens in TOON than JSON, and e-commerce orders save about 35.4%. It's lossless, meaning you can convert JSON to TOON and back without data loss, but it's best for flat or uniformly structured data—deeply nested or irregular structures might not see as much benefit.

Syntax Rules

TOON's syntax is simple and focuses on brevity:

Objects: Keys and values are on the same line, separated by a colon (:). Nested objects are indented (default: 2 spaces).
Arrays: Start with the key followed by [N] where N is the exact item count (e.g., items[3]:). For uniform objects, add {field1,field2,...} after the length to define headers.
- Primitive arrays: Inline values separated by the delimiter (default: comma).
- Tabular arrays: Each row on a new indented line, with values matching the header order.
- Non-uniform arrays: Use a list format with - for each item.
Strings: Quoted only if they contain spaces, delimiters, colons, or could be confused with structure (e.g., numbers). Unquoted strings can include spaces if unambiguous.
Delimiters: Comma (default), tab (\t), or pipe (|)—tabs often tokenize better in LLMs.
Length Markers: Optional # prefix (e.g., [#2]) for clarity.
Empty Structures: Empty arrays are key[0]:; empty objects produce no output.
Quoting and Escaping: Minimal; escape quotes with backslashes if needed.

Examples

Here's a comparison:

JSON Example:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

Equivalent TOON:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Another example with mixed types:

JSON:

{
  "items": [1, "text", { "a": 1 }]
}

TOON (non-uniform array):

items[3]:
- 1
- text
- a: 1

Using a tab delimiter for better token efficiency:

users[2]{id\tname\trole}:
  1\tAlice\tadmin
  2\tBob\tuser

How to Use TOON for Prompts

TOON is straightforward to integrate into LLM prompts for both input and output:

As Input Data: Include TOON-formatted data in your prompt, wrapped in a fenced code block labeled ```toon:disable-run

Example Prompt:

The following data is in TOON format (2-space indentation, comma-delimited, arrays include length and field headers):

```toon
products[3]{id,name,price,stock}:
  101,Widget,19.99,50
  102,Gadget,29.99,30
  103,Tool,9.99,100

Based on this, recommend products under $20 with stock > 40.

Instructing Output: Tell the LLM to generate responses in TOON format to save tokens on the output side. Provide the exact structure, including headers and length requirements.

Example Prompt:

Analyze the query and output results as TOON. Use this structure:
results[N]{category,items[M]{name,description}}:
  (tabular rows here)

Query: List 2 fruits and 2 vegetables with brief descriptions.

Possible LLM Output:

results[2]{category,items[2]{name,description}}:
  fruits:
    Apple,Red and juicy fruit
    Banana,Yellow curved fruit
  vegetables:
    Carrot,Orange root vegetable
    Broccoli,Green flowering vegetable

Best Practices:
- Always match the array length [N] to the actual number of items for reliability.
- Use tabs or pipes if your LLM's tokenizer handles them well (test with tools like gpt-tokenizer).
- For parsing TOON output, use libraries to convert back to JSON if needed.
- Combine with techniques like few-shot prompting: Show 1-2 TOON examples in the prompt to guide the model.
- Test for accuracy—TOON has shown slightly higher parsing success rates (e.g., 86.6% vs JSON's 83.2% in benchmarks).

Implementations

TOON has libraries in several languages for encoding/decoding:

TypeScript/JavaScript: Official via npm (@toon-format/toon), with a CLI for file conversions.
Elixir: Available on Hex.pm.
Rust: On Crates.io.
PHP: Community implementations for prompt integration.

For more details, check the official spec and benchmarks on GitHub. If you're working with large datasets in AI prompts, TOON can significantly cut costs and improve efficiency.

usametov/toon.md

Select an option

No results found