Quantile sketches are a different beast—they don't just count unique items; they help you understand the distribution of your data (like finding the 95th percentile latency or the median salary) without storing every single value.
Here is the breakdown for Quantile data sketches in Markdown:
| Algorithm | Memory | Accuracy | Notes |
|---|---|---|---|
| GK Array | Deterministic |
The classic foundation. Provides a guaranteed error bound but can be memory-heavy as |
|
| KLL Sketch | Probabilistic |
Near-optimal space complexity. Excellent for merging multiple sketches (map-reduce friendly). | |
| T-Digest |
|
High at extremes | Exceptional for tail latencies (99th, 99.9th percentiles) by using "centroids" that get smaller at the edges. |
| DDSketch | Relative Error | Maintains a fixed relative error (e.g., 1%) rather than a rank error. Great for monitoring systems. | |
| Moments Sketch | Very Low ( |
Varies | Uses statistical moments (mean, variance, skew). Fast but can be less accurate for complex distributions. |
| REQ Sketch | High at one edge | "Relative Error Quantiles." Specifically optimized for very high or very low rank accuracy (e.g., the "high-resolution" end of a distribution). |
- Use T-Digest if you are monitoring SLAs and care deeply about the 99th percentile (the "tail").
- Use KLL if you need a mathematically robust, mergeable sketch for big data frameworks like Apache Spark or Flink.
- Use DDSketch if you want a guaranteed relative error (e.g., "I'm always within 1% of the true value") across the entire range.
Would you like me to explain the difference between Rank Error (KLL) and Relative Error (DDSketch) in more detail?