Skip to content

Instantly share code, notes, and snippets.

@thanos
Last active March 15, 2026 16:24
Show Gist options
  • Select an option

  • Save thanos/3b95192f7f48965530351d71d578a717 to your computer and use it in GitHub Desktop.

Select an option

Save thanos/3b95192f7f48965530351d71d578a717 to your computer and use it in GitHub Desktop.
Membership Filter data sketches

Membership filters (also called Approximate Membership Query or AMQ structures) are used to quickly check if an element is in a set. They are designed to never return a false negative—if the filter says "No," it is definitely not there. If it says "Yes," there's a small chance it might be a false positive.

Here is the table for Membership Filter data sketches:

Algorithm Memory (vs Bloom) CPU / Op Accuracy (vs Bloom) Delete Merge Notes
Bloom Filter $1.44 \log_2(1/\epsilon)$ bits/item Fast Baseline No Yes The industry standard; simple bit-array.
Counting Bloom 3x – 4x Larger Medium Same Yes Yes Adds counters to allow for item removal.
Cuckoo Filter ~20% Smaller Fast Better at low $\epsilon$ Yes No Fingerprint-based; better space efficiency for low error rates.
XOR Filter ~23% Smaller Very Fast Same No No Static only; requires knowing all keys at construction.
Binary Fuse ~35% Smaller Extremely Fast Same No No Static only; modern successor to XOR with better cache locality.
Quotient Filter Comparable Medium Same Yes Yes High performance for SSD/Disk-resident filters.
Ribbon Filter ~40% Smaller Fast Better efficiency No Yes Static only; highly space-optimized for LSM-tree indexes.
GCS Filter ~45% Smaller Slow Best efficiency No No Static only; uses Golomb coding for maximum compression.

Key Comparisons

  • Dynamic vs. Static: If you need to add data over time, use Bloom or Cuckoo. If your data is fixed (e.g., an index for a static file), use Binary Fuse or Ribbon to save massive amounts of RAM.
  • The "Deletion" Factor: Only Cuckoo and Quotient filters natively support removing an item without rebuilding the entire structure.
  • Cache Efficiency: Quotient and Cuckoo filters are generally better for modern CPUs because they access contiguous memory slots rather than jumping to random bits like a standard Bloom Filter.

Would you like me to explain how "peeling" works in XOR and Binary Fuse filters?

@thanos
Copy link
Copy Markdown
Author

thanos commented Mar 15, 2026

Algorithm Memory (vs Bloom) CPU / Op Accuracy (vs Bloom) Delete Merge Notes
Bloom Filter Baseline Fast Baseline No Yes The industry standard; simple bit-array.
Counting Bloom 3x – 4x Larger Medium Same Yes Yes Adds counters to allow for item removal.
Cuckoo Filter ~20% Smaller Fast Better at low $\epsilon$ Yes No Fingerprint-based; better space efficiency for low error rates.
XOR Filter ~23% Smaller Very Fast Same No No Static only; requires knowing all keys at construction.
Binary Fuse ~35% Smaller Extremely Fast Same No No Static only; modern successor to XOR with better cache locality.
Quotient Filter Comparable Medium Same Yes Yes High performance for SSD/Disk-resident filters.
Ribbon Filter ~40% Smaller Fast Better efficiency No Yes Static only; highly space-optimized for LSM-tree indexes.
GCS Filter ~45% Smaller Slow Best efficiency No No Static only; uses Golomb coding for maximum compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment