Now we reach the defining feature of Tree-sitter: Incremental Parsing. In a traditional parser, if a user has a 10,000-line file and types a single }, you re-parse the whole file. In Tree-sitter, you only re-parse what changed.
To do this in Zig, we need to introduce Edit Mapping and Node Reuse.
When a user types, we receive an "edit" event. This isn't just the new string; it's the range of bytes that were replaced.
pub const Edit = struct {
start_byte: usize,
old_end_byte: usize,
new_end_byte: usize,
};When an edit occurs, we don't throw away the old tree. Instead, we "walk" the tree and mark every node that intersects with this byte range as dirty.
During the next parse, the parser maintains a "cursor" on the old tree and a "current position" in the new buffer.
Before the Lexer even runs, the Parser asks: "Is there a node in the old tree at this exact position that is NOT dirty and matches what I need right now?"
- Check Position: Does the old node's
start_bytematch my currentnew_pos? - Check State: Was the parser in the same StateId (from Phase 3) when it originally created this node?
- Action: If yes, Shift the entire subtree onto the new stack and skip its byte length. You just saved the work of parsing 1,000 lines.
You can implement a simple version of this by keeping your nodes in a sorted list or a spatial tree (like a Gap Buffer or a simple array of ranges).
pub fn findReusableNode(self: *Parser, current_pos: usize, current_state: StateId) ?NodeId {
// 1. Look in the old tree for a node at current_pos
const node_id = self.old_tree.findNodeAt(current_pos) orelse return null;
const node = self.old_tree.nodes.get(node_id);
// 2. If the node is marked dirty (overlaps with an Edit), we can't use it
if (node.is_dirty) return null;
// 3. Critically, the Parse Table must agree that this 'Symbol'
// is valid in the current StateId.
if (!self.parse_table.isValidTransition(current_state, node.tag)) return null;
return node_id;
}There is one trap: The Lookahead. If a node was originally parsed because the lexer looked at the next character to decide where to stop, and that character changed, the node might now be invalid even if the edit was "outside" the node.
Tree-sitter solves this by storing the max_lookahead_extent for every node. When you edit, you don't just mark nodes at the edit site as dirty; you mark nodes whose "lookahead" reached into the edit site.
Try to simulate an edit in Zig:
- Take a successfully parsed tree of
fn main() {}. - "Edit" the space between
fnandmain. - Write a function that iterates through your
std.ArrayList(Node)and marks any node whosetoken_rangeoverlaps the edited index as "invalid". - Run your parser again and see if it can "skip" re-parsing the
main() {}block by simply copying the oldNodeIds.
Because Tree-sitter relies on heavy "pointer-chasing" through trees, Zig’s ability to use Data-Oriented Design (storing nodes in contiguous arrays/slices) makes the "reuse" check incredibly fast compared to managed languages.
Does the logic of "Reusing a subtree only if the StateId matches" make sense? This is the bridge between the State Machine we built in the last chapter and the efficiency of a real-world editor.