Lesson 2, extension about Zig memory management

Since you're architecting a high-performance parser for large-scale data (like your work with millions of records), Zig's approach to memory is your greatest ally. It avoids the "hidden" costs of garbage collection by making every allocation explicit.

In Zig, if a function needs memory, it must ask for an Allocator.

1. Manual Memory Management: The Allocator

Zig does not have a global heap. Instead, you pass an Allocator (an interface) to any structure—like your Parser—that needs to grow.

Explicit Control: You decide if memory lives on the stack, the heap, or a fixed-size buffer.
The defer Keyword: To prevent memory leaks, Zig uses defer to ensure memory is freed as soon as the scope closes.
Safety: Using the GeneralPurposeAllocator (GPA) during development will catch memory leaks and "double-frees" immediately.

const std = @import("std");

pub fn main() !void {
    // 1. Initialize a GPA to track allocations and leaks
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    defer _ = gpa.deinit(); 

    // 2. Allocate a single integer on the heap
    const ptr = try allocator.create(i32);
    
    // 3. Defer the destruction (freeing) of that memory
    defer allocator.destroy(ptr);
    
    ptr.* = 42;
}

2. Deep Dive: `std.ArrayList`

For your Parser's nodes and stack, std.ArrayList(T) is the essential tool. It is a contiguous, growable array (similar to std::vector in C++).

How it works internally:

It starts with a small buffer.
When you append() and the buffer is full, it allocates a new, larger block (usually doubling capacity).
It copies the old elements to the new block and frees the old one.

Key Methods for your Parser:

init(allocator): Creates the list with zero capacity.
append(item): Adds an element. Since this might fail (Out of Memory), you must use try.
items: A slice ([]T) giving you direct access to the underlying memory.
pop(): Removes and returns the last element—critical for Shift-Reduce logic.

3. Performance & Cache Locality

Using ArrayList(Node) with index-based NodeIds (as in your CST plan) is highly efficient:

Cache Locality: Because all Node structs are side-by-side, the CPU can pre-fetch them. Jumping to nodes.items[node_id] is much faster than following pointers.
Reduced Overhead: On 64-bit systems, a pointer is 8 bytes. A u32 (your NodeId) is only 4 bytes. This halves the memory footprint of your tree links.
Batch Freeing: Instead of freeing thousands of individual nodes, calling nodes.deinit() clears the entire memory block at once.

4. The "Invalid Slice" Pitfall

A common mistake in Zig is holding a slice of an ArrayList while it’s still growing.

Warning: If you grab const slice = list.items and then call list.append(), the list might move to a new memory address to accommodate the new item. Your old slice now points to "garbage" memory.

Rule of Thumb: When building a tree that is still growing, always refer to other nodes by their Index (NodeId), never by pointer or slice.

How familiar are you with Zig's Slices ([]T)? They are the "glue" that connects your ArrayList storage to your parsing logic.

alogic0/lexer_2_1.md

Select an option

No results found

Select an option

No results found

1. Manual Memory Management: The Allocator

2. Deep Dive: `std.ArrayList`

How it works internally:

Key Methods for your Parser:

3. Performance & Cache Locality

4. The "Invalid Slice" Pitfall

alogic0/lexer_2_1.md

1. Manual Memory Management: The Allocator

2. Deep Dive: std.ArrayList

How it works internally:

Key Methods for your Parser:

3. Performance & Cache Locality

4. The "Invalid Slice" Pitfall

2. Deep Dive: `std.ArrayList`