| Hardware Generation | Architecture | Comparative Efficiency (2026 Context) | Status |
|---|---|---|---|
| CPU | General-purpose logic | Negligible for BTC; strong for Monero/Verus | Consumer-grade |
| GPU | Parallel graphics processing | Inefficient for SHA-256; versatile for altcoins/AI | Enthusiast-grade |
| FPGA | Reprogrammable logic | Niche; algorithm-agile | Specialist-grade |
| ASIC | Fixed-logic circuit | Maximum efficiency for specific algorithms | Industrial-grade |
| Family | Example Algorithms | Answers What Question? | Typical Error | Mergeable |
|---|---|---|---|---|
| Cardinality | HyperLogLog, CPC | How many unique elements exist? | ~1% | Yes |
| Frequency | Count-Min, SpaceSaving | Which items appear most often? | Additive | Yes |
| Quantile | KLL, DDSketch | What are the percentiles? | Rank or relative | Yes |
| Membership | Bloom, Cuckoo, XOR | Have we seen this element? | False positives | No (usually) |
| Set | Theta, KMV | What is the overlap between sets? | ~1–2% | Yes |
| Reconciliation | IBLT | What differs between datasets? | Capacity bound | Yes |
| Algorithm | Approach | Trade-off |
|---|---|---|
| IBLT | Hash-based balance sheet | Simple, widely used |
| Characteristic Polynomial Filters (CPF) | Invertible polynomials | Better space efficiency, higher CPU cost |
| Eppstein's Straggler Detection | Simplified IBLT variant | Best for finding a small number of missing elements in a stream |
These sketches are used when you need more than just a count; they allow for complex set operations like Intersections and Differences while maintaining a probabilistic estimate.
| Algorithm | Memory | CPU / Op | Accuracy | Delete | Merge | Notes |
|---|---|---|---|---|---|---|
| Theta Sketch | Fixed ( |
Medium | Configurable | Yes | Yes | Gold standard for Intersections and Subtractions. |
| Invertible Bloom Lookup Table (IBLT) |
|
Medium | Exact (if |
Yes | Yes | Specifically designed for Set Reconciliation; can list actual missing keys. |
| Tuple Sketch |
|
Medium | Configurable | Yes | Yes | Theta extension that tracks metadata/attributes per key. |
| HLL (HyperLogLog) | Extremely Low | Fast | No | Yes | Efficient for Unions; poor for intersections/differences. |
Membership filters (also called Approximate Membership Query or AMQ structures) are used to quickly check if an element is in a set. They are designed to never return a false negative—if the filter says "No," it is definitely not there. If it says "Yes," there's a small chance it might be a false positive.
Here is the table for Membership Filter data sketches:
| Algorithm | Memory (vs Bloom) | CPU / Op | Accuracy (vs Bloom) | Delete | Merge | Notes |
|---|---|---|---|---|---|---|
| Bloom Filter |
|
Fast | Baseline | No | Yes | The industry standard; simple bit-array. |
| Counting Bloom | 3x – 4x Larger | Medium | Same | Yes | Yes | Adds counters to allow for item removal. |
| Cuckoo Filter | ~20% Smaller | Fast | Better at low |
Yes | No | Fingerprint-based; better space efficiency for low error rates. |
Frequency sketches (also called "Count-Min" or "Heavy Hitters" sketches) are used when you need to know how often a specific item has appeared without storing every single occurrence. They are the go-to for identifying "Top-K" items or finding popular hashtags/IP addresses in real-time.
| Algorithm | Memory | CPU / Op | Accuracy | Notes |
|---|---|---|---|---|
| Count-Min Sketch | Very Fast | Probabilistic (Overestimates) | The standard for frequency; easy to implement and merge. | |
| Count-Sketch | Higher than Count-Min | Fast | Unbiased (Lower Variance) | Uses |
| Space-Saving | Medium | High for "Heavy Hitters" | Deterministic; maintains a "Stream Summary" of top elements. | |
| Misra-Gries | Fast | High for Top-$K$ | Classic algorithm; finds elements with frequency |
Quantile sketches are a different beast—they don't just count unique items; they help you understand the distribution of your data (like finding the 95th percentile latency or the median salary) without storing every single value.
Here is the breakdown for Quantile data sketches in Markdown:
| Algorithm | Memory | Accuracy | Notes |
|---|---|---|---|
| GK Array | Deterministic |
The classic foundation. Provides a guaranteed error bound but can be memory-heavy as |
|
| KLL Sketch | Probabilistic |
Near-optimal space complexity. Excellent for merging multiple sketches (map-reduce friendly). | |
| T-Digest |
|
High at extremes | Exceptional for tail latencies (99th, 99.9th percentiles) by using "centroids" that get smaller at the edges. |
| DDSketch | Relative Error | Maintains a fixed relative error ( |
You're absolutely right to call that out! I used a mix of formatting there, but you specifically asked for Markdown. I'll strip away the extra fluff and give you a clean, standard Markdown table that you can easily copy-paste into a README or documentation file.
Here is the data formatted as requested:
| Algorithm | Memory | Accuracy | Notes |
|---|---|---|---|
| Linear Counting |
|
High | Best for small sets; memory scales linearly with cardinality. |
| LogLog | Foundation for modern sketches; uses bit-pattern estimation. | ||
| HyperLogLog (HLL) | ~1.5 KB for |
Industry standard; uses stochastic averaging for low variance. | |
| HLL++ | Variable (Sparse/Dense) | High (inc. small sets) | Google's version; fixes HLL's bias for small cardinalities. |
| Algorithm | Deletions | Merges | Notes |
|---|---|---|---|
| Bloom Filter | difficult | yes - not dynamic | classic |
| Cuckoo Filter | YES | YES | modern |
| XOR Filter | no | no | modern |
| Quotient Filter | YES | YES | SSD friendly |
| CQF | yes | yes | high performance |
| Binary Fuse Filter | no | no | extremely compact modern |
| Ribbon Filter | no | no | space efficient new design |
| Algorithm | Strength |
|---|---|
| KLL | General purpose |
| REQ | Optimised for high quantiles (99th, 99.9th) |
| DDSketch | Constant relative error across all quantiles |
| t-Digest | Highly accurate at the tails |