Since you're architecting a high-performance parser for large-scale data (like your work with millions of records), Zig's approach to memory is your greatest ally. It avoids the "hidden" costs of garbage collection by making every allocation explicit.
In Zig, if a function needs memory, it must ask for an Allocator.
Zig does not have a global heap. Instead, you pass an Allocator (an interface) to any structure—like your Parser—that needs to grow.
- Explicit Control: You decide if memory lives on the stack, the heap, or a fixed-size buffer.
- The
deferKeyword: To prevent memory leaks, Zig usesdeferto ensure memory is freed as soon as the scope closes. - Safety: Using the
GeneralPurposeAllocator(GPA) during development will catch memory leaks and "double-frees" immediately.
const std = @import("std");
pub fn main() !void {
// 1. Initialize a GPA to track allocations and leaks
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const allocator = gpa.allocator();
defer _ = gpa.deinit();
// 2. Allocate a single integer on the heap
const ptr = try allocator.create(i32);
// 3. Defer the destruction (freeing) of that memory
defer allocator.destroy(ptr);
ptr.* = 42;
}For your Parser's nodes and stack, std.ArrayList(T) is the essential tool. It is a contiguous, growable array (similar to std::vector in C++).
- It starts with a small buffer.
- When you
append()and the buffer is full, it allocates a new, larger block (usually doubling capacity). - It copies the old elements to the new block and frees the old one.
init(allocator): Creates the list with zero capacity.append(item): Adds an element. Since this might fail (Out of Memory), you must usetry.items: A slice ([]T) giving you direct access to the underlying memory.pop(): Removes and returns the last element—critical for Shift-Reduce logic.
Using ArrayList(Node) with index-based NodeIds (as in your CST plan) is highly efficient:
- Cache Locality: Because all
Nodestructs are side-by-side, the CPU can pre-fetch them. Jumping tonodes.items[node_id]is much faster than following pointers. - Reduced Overhead: On 64-bit systems, a pointer is 8 bytes. A
u32(yourNodeId) is only 4 bytes. This halves the memory footprint of your tree links. - Batch Freeing: Instead of freeing thousands of individual nodes, calling
nodes.deinit()clears the entire memory block at once.
A common mistake in Zig is holding a slice of an ArrayList while it’s still growing.
Warning: If you grab
const slice = list.itemsand then calllist.append(), the list might move to a new memory address to accommodate the new item. Your oldslicenow points to "garbage" memory.
Rule of Thumb: When building a tree that is still growing, always refer to other nodes by their Index (NodeId), never by pointer or slice.
How familiar are you with Zig's Slices ([]T)? They are the "glue" that connects your ArrayList storage to your parsing logic.