Parser generator

Tree-sitter is a high-speed, incremental parsing system that has revolutionized how editors like Neovim and VS Code handle syntax highlighting and code navigation. To understand its algorithms, you need to bridge the gap between traditional context-free grammars and modern incremental state management.

Here is your structured learning path to mastering the mechanics behind Tree-sitter.

Phase 1: The Theoretical Foundation

Before diving into Tree-sitter’s source code, you must understand the "Classical" way of parsing. Tree-sitter is based on LR(1) parsing, but with specific enhancements.

Context-Free Grammars (CFG): Understand Backus-Naur Form (BNF).
LR(1) Parsing: Learn how Shift-Reduce parsers work using a stack and a state machine.
The GLR Algorithm: Tree-sitter uses Generalized LR (GLR) to handle ambiguous grammars (like C++ or JavaScript) by splitting the parser into multiple "heads" when an ambiguity is encountered.

Phase 2: Incremental Parsing

This is the "secret sauce." Most parsers re-scan the entire file when you type a single character. Tree-sitter does not.

Subtree Reuse: Learn how the algorithm identifies which parts of the old Concrete Syntax Tree (CST) are unaffected by an edit.
The Versioned Tree: Study how nodes are marked as "dirty" and how the parser resumes from the closest valid state.
Complexity Goals: The goal is $O(\log n)$ for most edits, where $n$ is the number of tokens.

Phase 3: The Tree-sitter Implementation

Now, look at how the library actually handles the heavy lifting.

The Lexer vs. The Parser: Tree-sitter uses a generated C-based state machine.
External Scanners: Some languages (like Python with its indentation) aren't context-free. Learn how Tree-sitter uses "external scanners" to handle these edge cases.
Conflict Resolution: Study how prec (precedence) and left/right associativity are baked into the generated parse table.

Phase 4: Hands-on Exploration

The best way to understand the algorithm is to see it fail and succeed in real-time.

The Playground: Use the Tree-sitter Playground to visualize how a tree changes as you type.
Grammar Development: Write a simple grammar for a calculator or a "mini-JSON" using the grammar.js DSL.
Inspect the C: Run tree-sitter generate and look at the parser.c file. It’s a massive array of integers representing the state machine—try to trace a single "Shift" action.

alogic0/tree-sitter-learning-path.md

Select an option

No results found