Skip to content

Instantly share code, notes, and snippets.

@thanos
Created March 15, 2026 15:47
Show Gist options
  • Select an option

  • Save thanos/49dd79d99f0ab8a2469150ac7a7c3929 to your computer and use it in GitHub Desktop.

Select an option

Save thanos/49dd79d99f0ab8a2469150ac7a7c3929 to your computer and use it in GitHub Desktop.
Quantile sketches

Quantile sketches are a different beast—they don't just count unique items; they help you understand the distribution of your data (like finding the 95th percentile latency or the median salary) without storing every single value.

Here is the breakdown for Quantile data sketches in Markdown:

Algorithm Memory Accuracy Notes
GK Array $O(\frac{1}{\epsilon} \log(\epsilon N))$ Deterministic $\epsilon$-approx The classic foundation. Provides a guaranteed error bound but can be memory-heavy as $N$ grows.
KLL Sketch $O(\frac{1}{\epsilon})$ Probabilistic $\epsilon$-approx Near-optimal space complexity. Excellent for merging multiple sketches (map-reduce friendly).
T-Digest $O(K)$ (clusters) High at extremes Exceptional for tail latencies (99th, 99.9th percentiles) by using "centroids" that get smaller at the edges.
DDSketch $O(\frac{1}{\alpha} \log(\text{max}/\text{min}))$ Relative Error Maintains a fixed relative error (e.g., 1%) rather than a rank error. Great for monitoring systems.
Moments Sketch Very Low ($O(k)$ moments) Varies Uses statistical moments (mean, variance, skew). Fast but can be less accurate for complex distributions.
REQ Sketch $O(\frac{1}{\epsilon})$ High at one edge "Relative Error Quantiles." Specifically optimized for very high or very low rank accuracy (e.g., the "high-resolution" end of a distribution).

Quick Cheat Sheet for Choosing:

  • Use T-Digest if you are monitoring SLAs and care deeply about the 99th percentile (the "tail").
  • Use KLL if you need a mathematically robust, mergeable sketch for big data frameworks like Apache Spark or Flink.
  • Use DDSketch if you want a guaranteed relative error (e.g., "I'm always within 1% of the true value") across the entire range.

Would you like me to explain the difference between Rank Error (KLL) and Relative Error (DDSketch) in more detail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment