Skip to content

Instantly share code, notes, and snippets.

@ties
Created February 9, 2026 19:57
Show Gist options
  • Select an option

  • Save ties/f593702a1ea9511028a6ecf082c8e6a8 to your computer and use it in GitHub Desktop.

Select an option

Save ties/f593702a1ea9511028a6ecf082c8e6a8 to your computer and use it in GitHub Desktop.
Gemini Pro 3 and prefix datastructures

Here is a combined overview of IPv4 and IPv6 lookup data structures, focusing on Fixed-Stride, Level-Compressed (LC), and Multi-Bit Tries.

This guide prioritizes L3 cache efficiency and Memory Mapping (mmap) capabilities in Rust.


The Fundamental Difference: Density vs. Sparsity

To choose the right structure, you must understand the discrepancy between IPv4 and IPv6:

  1. IPv4 (32-bit): The address space is small ($2^{32}$) and dense. You can afford "wasteful" pre-allocation (like a 16MB array) to gain raw speed.
  2. IPv6 (128-bit): The address space is massive ($2^{128}$) and sparse. You cannot pre-allocate large fixed strides without compression. You must use a structure that compresses empty branches (via Bitmaps or Level Compression) to fit in the CPU cache.

Strategy A: The "Speed King" (Fixed-Stride / DIR-24-8)

Best for: IPv4 (Production Routing)

This is the standard for high-performance software routers (DPDK, VPP). It minimizes memory accesses (usually 1 or 2) at the cost of memory usage.

  • Mechanism:
    • Table 1 (DIR-24): A flat array of $2^{24}$ (16 million) entries covering the first 24 bits of the IP.
    • Table 2 (DIR-8): Tiny secondary tables for the remaining 8 bits.
  • Cache: The main table (16-32MB) fits in L3 cache on modern server CPUs (which often have >40MB L3).
  • IPv6 Suitability: Impossible. A $2^{24}$ stride is fine, but you'd need millions of subsequent tables.
  • Rust Implementation:
    • dpdk-rs wrappers (binding to C) are common.
    • Manual Vec: In Rust, this is trivial to implement as a Vec<u16> (holding indices) for the 24-bit table.

Strategy B: The "Hardware Logic" (Tree-Bitmap)

Best for: IPv4 & IPv6 (Balanced)

This algorithm mimics hardware routers. It is a Fixed-Stride Trie but with Bitmap Compression. It is the most robust all-rounder for Rust implementations.

  • Mechanism:
    • It uses a fixed stride (e.g., 4 bits per node).
    • Instead of an array of 16 pointers (which bloats memory), it uses a 16-bit bitmap.
    • Children are stored contiguously in a single Vec.
  • L3 Cache: Excellent. Nodes are tiny (bitmap + index), so many fit in a cache line.
  • Mmap: High. Because it uses a single backing Vec (arena style), it can be serialized directly to disk and mapped back without pointer fixups.
  • Rust Recommendation:
    • Crate: ip_network_table
    • Details: Uses the Tree-Bitmap algorithm. It handles both IPv4 and IPv6 elegantly by changing the stride logic internally.

Strategy C: The "Modern LC-Trie" (Poptrie)

Best for: IPv6 (Extreme Read Speed)

Poptrie (Population Count Trie) is the evolution of the LC-Trie. It uses the CPU's popcnt instruction to compress variable strides into dense arrays.

  • Mechanism:
    • It analyzes the tree and creates nodes with large strides (e.g., 6 bits = 64 children).
    • It uses a 64-bit bitmap to mark which children exist.
    • The index is calculated: index = popcnt(bitmap & mask).
  • L3 Cache: Supreme. It achieves the highest density of any structure here. It fits massive routing tables (Full Internet BGP) into L2/L3 cache.
  • Mmap: Native. Poptrie is designed to be built into a contiguous array of u64s.
  • Rust Recommendation:
    • Crate: oxidecomputer/poptrie
    • Details: Developed for the Helios emulator. It is read-optimized and extremely compact.

Combined Comparison Matrix

Feature DIR-24-8 Tree-Bitmap Poptrie / LC-Trie Standard Radix
Primary Use IPv4 High-Speed IPv4/IPv6 General IPv6 High-Speed Generic Maps
Lookup Speed πŸš€ Fastest (1-2 accesses) ⚑ Fast (Unrolled loops) ⚑ Fast (Popcnt math) 🐒 Slower (Pointer chase)
Memory Footprint 🐘 High (Pre-allocated) πŸ“¦ Compact 🐜 Tiny 🎈 Medium (Pointers)
IPv6 Capable? ❌ No βœ… Yes βœ… Yes βœ… Yes
L3 Cache Fit ⚠️ Borderline (IPv4 only) βœ… Good βœ… Excellent ❌ Poor
Mmap Friendly? βœ… Yes (Flat Arrays) βœ… Yes (Contiguous Vec) βœ… Yes (Bit Arrays) ❌ No (Heap Pointers)

Top Rust Recommendations

1. The "Standard" Choice: ip_network_table

If you need a crate that "just works" for both IPv4 and IPv6, supports Longest Prefix Match (LPM), and is memory efficient.

2. The "Performance" Choice: poptrie

If you are building a read-heavy system (like a firewall or packet filter) where you build the table once and query it billions of times.

3. The "Zero-Copy / Static" Choice: fst

If your data is static (e.g., a GeoIP database) and you want to mmap it from disk with zero parsing overhead.

  • Algo: Finite State Transducer (effectively a highly compressed, shared-suffix trie).
  • Note: You map IP bytes to values. It is immutable.
  • Link: crates.io/crates/fst

4. The "Manual" Approach (for IPv4 DIR-24-8)

If you specifically need the raw speed of DIR-24-8 for IPv4, there isn't a dominant "pure Rust" general-purpose crate because the logic is so simple most engines write it manually.

  • Implementation Sketch:
    // A simple DIR-24-8 structure
    struct Ipv4Table<T> {
        // Covers first 24 bits.
        // Value: Some(T) or Index into `small_tables`
        main_table: Vec<Entry<T>>, 
        
        // Covers last 8 bits.
        // Allocated only when needed.
        small_tables: Vec<[Entry<T>; 256]>, 
    }

Since you are open to dynamic stride (and assuming you want high performance for IPv6 with cache efficiency), the landscape opens up significantly.

For IPv6, a "Dynamic Stride" or "Adaptive" trie is actually superior to a Fixed-Stride trie. IPv6 addresses are 128 bits long but usually extremely sparse. A fixed stride forces you to allocate memory for empty space, whereas a dynamic stride (like an LC-Trie or ART) collapses the empty bits, allowing the structure to fit into the L3 cache.

Here are the best Rust implementations for Dynamic/Adaptive Stride Tries suitable for IPv6, specifically looking at cache locality and memory layout.

1. Poptrie (The "Modern" LC-Trie)

Algorithm: Population Count Trie Stride: Dynamic/Compressed (e.g., 6-bit stride compressed into dense arrays).

This is widely considered the successor to the LC-Trie for routing. Instead of a standard node with 64 pointers (for a 6-bit stride), it uses a 64-bit bitmap and the CPU's popcnt instruction.

  • How it works: If a node has only 3 valid children out of 64 possibilities, the bitmap marks their positions. The "index" in the child array is calculated by counting the bits set to 1 before the target bit.
  • Cache & Mmap: It stores nodes in a contiguous array (perfect for mmap and L3 cache). It eliminates the "pointer chasing" overhead.
  • Rust Implementation: oxidecomputer/poptrie
    • Context: Used by Oxide Computer (Bryan Cantrill et al.) for their high-performance hypervisor/emulator networking. It is designed precisely for the use case you described.

2. Adaptive Radix Tree (ART)

Algorithm: ART (Adaptive Radix Tree) Stride: Adaptive (Nodes change size: Node4, Node16, Node48, Node256).

ART is the standard-bearer for modern in-memory databases (like HyPer). It treats the IPv6 address as a sequence of bytes (8-bit stride), but the node itself changes layout dynamically based on how full it is.

  • How it works:
    • If a node has 2 children, it uses a tiny "Node4" (4 keys + 4 pointers).
    • If it grows to 20 children, it hot-swaps to a "Node48".
  • IPv6 Fit: Excellent. IPv6 hierarchies are often deep but sparse. ART collapses the path significantly compared to a binary trie.
  • Cache: Very good. By keeping nodes small, it maximizes the amount of the tree that stays in the CPU cache.
  • Rust Implementation: blart
    • Why this one: There are several ART crates, but blart is currently the most actively maintained and feature-complete Rust implementation.
    • Alternative: art-tree is another solid option.

3. Tree-Bitmap (Dynamic/Hybrid)

Algorithm: Tree-Bitmap (Stride usually 4 or 8, but compressed). Stride: Fixed-logic, but Dynamic-storage.

I mentioned this in the previous answer, but it remains relevant here. It sits between a pure LC-Trie and a Fixed-Stride trie.

  • How it works: It grabs a chunk of bits (stride), but unlike a naive array, it uses a bitmap to indicate which next-hops exist. It packs the children into a single contiguous vector.
  • Mmap: Because it is usually implemented as a single struct wrapping Vec<Node>, it is arguably the easiest to serialize directly to disk and mmap back into memory.
  • Rust Implementation: ip_network_table

Summary Comparison for IPv6

Algorithm Implementation Cache Strategy Best For...
Poptrie oxidecomputer/poptrie Bit-compressed arrays. Extreme density. Fits in L3. Raw Read Speed. If you build once and read 10 million times/sec.
ART blart Adaptive Node Sizes. Optimizes for density at every level. General Purpose. If you need to insert/delete routes frequently while maintaining speed.
TreeBitmap ip_network_table Contiguous Vectors. Hardware-like stride logic. Simplicity & Mmap. If you want a standard routing table structure that is easily persisted.

Technical Note on "Memory Mapping" in Rust

If your goal is to mmap a pre-built trie from disk so it loads instantly (zero-copy), you must avoid standard Rust pointer types (Box, Rc, Arc) because they point to heap addresses that won't be valid when re-loaded.

  1. For ip_network_table: You can use libraries like serde with bincode to serialize the internal vectors, but true "zero-copy" mmap requires a library like rkyv.
  2. For Zero-Copy Trie: If you need absolute raw performance and zero-copy loading, look at fst. It is a Finite State Transducer (effectively a compressed trie). It maps byte-keys (IPv6 bytes) to values. It is immutable and designed specifically to be memory-mapped. It is often smaller than the raw data itself, guaranteeing L3 cache fit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment