Membership filters (also called Approximate Membership Query or AMQ structures) are used to quickly check if an element is in a set. They are designed to never return a false negative—if the filter says "No," it is definitely not there. If it says "Yes," there's a small chance it might be a false positive.
Here is the table for Membership Filter data sketches:
| Algorithm | Memory (vs Bloom) | CPU / Op | Accuracy (vs Bloom) | Delete | Merge | Notes |
|---|---|---|---|---|---|---|
| Bloom Filter |
|
Fast | Baseline | No | Yes | The industry standard; simple bit-array. |
| Counting Bloom | 3x – 4x Larger | Medium | Same | Yes | Yes | Adds counters to allow for item removal. |
| Cuckoo Filter | ~20% Smaller | Fast | Better at low |
Yes | No | Fingerprint-based; better space efficiency for low error rates. |
| XOR Filter | ~23% Smaller | Very Fast | Same | No | No | Static only; requires knowing all keys at construction. |
| Binary Fuse | ~35% Smaller | Extremely Fast | Same | No | No | Static only; modern successor to XOR with better cache locality. |
| Quotient Filter | Comparable | Medium | Same | Yes | Yes | High performance for SSD/Disk-resident filters. |
| Ribbon Filter | ~40% Smaller | Fast | Better efficiency | No | Yes | Static only; highly space-optimized for LSM-tree indexes. |
| GCS Filter | ~45% Smaller | Slow | Best efficiency | No | No | Static only; uses Golomb coding for maximum compression. |
- Dynamic vs. Static: If you need to add data over time, use Bloom or Cuckoo. If your data is fixed (e.g., an index for a static file), use Binary Fuse or Ribbon to save massive amounts of RAM.
- The "Deletion" Factor: Only Cuckoo and Quotient filters natively support removing an item without rebuilding the entire structure.
- Cache Efficiency: Quotient and Cuckoo filters are generally better for modern CPUs because they access contiguous memory slots rather than jumping to random bits like a standard Bloom Filter.
Would you like me to explain how "peeling" works in XOR and Binary Fuse filters?