Quantile sketches

Quantile sketches are a different beast—they don't just count unique items; they help you understand the distribution of your data (like finding the 95th percentile latency or the median salary) without storing every single value.

Here is the breakdown for Quantile data sketches in Markdown:

Algorithm	Memory	Accuracy	Notes
GK Array	$O(\frac{1}{\epsilon} \log(\epsilon N))$	Deterministic $\epsilon$-approx	The classic foundation. Provides a guaranteed error bound but can be memory-heavy as $N$ grows.
KLL Sketch	$O(\frac{1}{\epsilon})$	Probabilistic $\epsilon$-approx	Near-optimal space complexity. Excellent for merging multiple sketches (map-reduce friendly).
T-Digest	$O(K)$ (clusters)	High at extremes	Exceptional for tail latencies (99th, 99.9th percentiles) by using "centroids" that get smaller at the edges.
DDSketch	$O(\frac{1}{\alpha} \log(\text{max}/\text{min}))$	Relative Error	Maintains a fixed relative error (e.g., 1%) rather than a rank error. Great for monitoring systems.
Moments Sketch	Very Low ($O(k)$ moments)	Varies	Uses statistical moments (mean, variance, skew). Fast but can be less accurate for complex distributions.
REQ Sketch	$O(\frac{1}{\epsilon})$	High at one edge	"Relative Error Quantiles." Specifically optimized for very high or very low rank accuracy (e.g., the "high-resolution" end of a distribution).

Quick Cheat Sheet for Choosing:

Use T-Digest if you are monitoring SLAs and care deeply about the 99th percentile (the "tail").
Use KLL if you need a mathematically robust, mergeable sketch for big data frameworks like Apache Spark or Flink.
Use DDSketch if you want a guaranteed relative error (e.g., "I'm always within 1% of the true value") across the entire range.

Would you like me to explain the difference between Rank Error (KLL) and Relative Error (DDSketch) in more detail?

thanos/quantile_sketches_comparison.md

Select an option

No results found

Select an option

No results found

Quick Cheat Sheet for Choosing: