thanos’s gists

thanos / mining_hardware.md

Created April 3, 2026 16:01

Mining Hardware

Hardware Generation	Architecture	Comparative Efficiency (2026 Context)	Status
CPU	General-purpose logic	Negligible for BTC; strong for Monero/Verus	Consumer-grade
GPU	Parallel graphics processing	Inefficient for SHA-256; versatile for altcoins/AI	Enthusiast-grade
FPGA	Reprogrammable logic	Niche; algorithm-agile	Specialist-grade
ASIC	Fixed-logic circuit	Maximum efficiency for specific algorithms	Industrial-grade

thanos / master_comparison_table_of_sketch_algorithms.md

Created March 16, 2026 00:52

Master Comparison Table of Sketch Algorithms

Family	Example Algorithms	Answers What Question?	Typical Error	Mergeable
Cardinality	HyperLogLog, CPC	How many unique elements exist?	~1%	Yes
Frequency	Count-Min, SpaceSaving	Which items appear most often?	Additive	Yes
Quantile	KLL, DDSketch	What are the percentiles?	Rank or relative	Yes
Membership	Bloom, Cuckoo, XOR	Have we seen this element?	False positives	No (usually)
Set	Theta, KMV	What is the overlap between sets?	~1–2%	Yes
Reconciliation	IBLT	What differs between datasets?	Capacity bound	Yes

thanos / Set-Reconciliation-Data-Sketches.md

Created March 16, 2026 00:49

Set Reconciliation Data Sketches

Algorithm	Approach	Trade-off
IBLT	Hash-based balance sheet	Simple, widely used
Characteristic Polynomial Filters (CPF)	Invertible polynomials	Better space efficiency, higher CPU cost
Eppstein's Straggler Detection	Simplified IBLT variant	Best for finding a small number of missing elements in a stream

thanos / set_data_sketches.md

Last active March 15, 2026 16:51

et Data Sketches (Advanced Operations)

Set Data Sketches (Advanced Operations)

These sketches are used when you need more than just a count; they allow for complex set operations like Intersections and Differences while maintaining a probabilistic estimate.

Algorithm	Memory	CPU / Op	Accuracy	Delete	Merge	Notes
Theta Sketch	Fixed ($O(k)$ entries)	Medium	Configurable	Yes	Yes	Gold standard for Intersections and Subtractions.
Invertible Bloom Lookup Table (IBLT)	$O(D)$ where $D$ is diff size	Medium	Exact (if $D$ is small)	Yes	Yes	Specifically designed for Set Reconciliation; can list actual missing keys.
Tuple Sketch	$O(k)$ + Attributes	Medium	Configurable	Yes	Yes	Theta extension that tracks metadata/attributes per key.
HLL (HyperLogLog)	Extremely Low	Fast	$\approx 1.04 / \sqrt{m}$	No	Yes	Efficient for Unions; poor for intersections/differences.

thanos / membership_filters.md

Last active March 15, 2026 16:24

Membership Filter data sketches

Membership filters (also called Approximate Membership Query or AMQ structures) are used to quickly check if an element is in a set. They are designed to never return a false negative—if the filter says "No," it is definitely not there. If it says "Yes," there's a small chance it might be a false positive.

Here is the table for Membership Filter data sketches:

Algorithm	Memory (vs Bloom)	CPU / Op	Accuracy (vs Bloom)	Delete	Merge	Notes
Bloom Filter	$1.44 \log_2(1/\epsilon)$ bits/item	Fast	Baseline	No	Yes	The industry standard; simple bit-array.
Counting Bloom	3x – 4x Larger	Medium	Same	Yes	Yes	Adds counters to allow for item removal.
Cuckoo Filter	~20% Smaller	Fast	Better at low $\epsilon$	Yes	No	Fingerprint-based; better space efficiency for low error rates.

thanos / frequency_sketches_comparison.md

Last active March 15, 2026 16:04

Frequency sketches (also called "Count-Min" or "Heavy Hitters" sketches) are used when you need to know how often a specific item has appeared without storing every single occurrence. They are the go-to for identifying "Top-K" items or finding popular hashtags/IP addresses in real-time.

Frequency Data Sketches (Point Queries & Top-K)

Algorithm	Memory	CPU / Op	Accuracy	Notes
Count-Min Sketch	$O(\frac{1}{\epsilon} \log \frac{1}{\delta})$	Very Fast	Probabilistic (Overestimates)	The standard for frequency; easy to implement and merge.
Count-Sketch	Higher than Count-Min	Fast	Unbiased (Lower Variance)	Uses $\pm 1$ hashing; better error distribution but slightly more CPU.
Space-Saving	$O(\frac{1}{\epsilon})$	Medium	High for "Heavy Hitters"	Deterministic; maintains a "Stream Summary" of top elements.
Misra-Gries	$O(K)$	Fast	High for Top-$K$	Classic algorithm; finds elements with frequency $> N/(K+1)$.

thanos / quantile_sketches_comparison.md

Created March 15, 2026 15:47

Quantile sketches

Quantile sketches are a different beast—they don't just count unique items; they help you understand the distribution of your data (like finding the 95th percentile latency or the median salary) without storing every single value.

Here is the breakdown for Quantile data sketches in Markdown:

Algorithm	Memory	Accuracy	Notes
GK Array	$O(\frac{1}{\epsilon} \log(\epsilon N))$	Deterministic $\epsilon$-approx	The classic foundation. Provides a guaranteed error bound but can be memory-heavy as $N$ grows.
KLL Sketch	$O(\frac{1}{\epsilon})$	Probabilistic $\epsilon$-approx	Near-optimal space complexity. Excellent for merging multiple sketches (map-reduce friendly).
T-Digest	$O(K)$ (clusters)	High at extremes	Exceptional for tail latencies (99th, 99.9th percentiles) by using "centroids" that get smaller at the edges.
DDSketch	$O(\frac{1}{\alpha} \log(\text{max}/\text{min}))$	Relative Error	Maintains a fixed relative error (

thanos / cardinality_data_sketches.md

Last active March 15, 2026 16:00

Cardinality Data Sketches

You're absolutely right to call that out! I used a mix of formatting there, but you specifically asked for Markdown. I'll strip away the extra fluff and give you a clean, standard Markdown table that you can easily copy-paste into a README or documentation file.

Here is the data formatted as requested:

Algorithm	Memory	Accuracy	Notes
Linear Counting	$O(N)$ bits	High	Best for small sets; memory scales linearly with cardinality.
LogLog	$O(\log(\log N))$	$\approx 1.30 / \sqrt{m}$	Foundation for modern sketches; uses bit-pattern estimation.
HyperLogLog (HLL)	~1.5 KB for $10^9$ items	$\approx 1.04 / \sqrt{m}$	Industry standard; uses stochastic averaging for low variance.
HLL++	Variable (Sparse/Dense)	High (inc. small sets)	Google's version; fixes HLL's bias for small cardinalities.

thanos / Membership-Filter-Comparison.md

Last active March 13, 2026 21:09

Membership Filter Comparison

Algorithm	Deletions	Merges	Notes
Bloom Filter	difficult	yes - not dynamic	classic
Cuckoo Filter	YES	YES	modern
XOR Filter	no	no	modern
Quotient Filter	YES	YES	SSD friendly
CQF	yes	yes	high performance
Binary Fuse Filter	no	no	extremely compact modern
Ribbon Filter	no	no	space efficient new design

thanos / Quantile-Algorithm-Comparison.md

Last active March 13, 2026 19:05

Quantile Algorithm Comparison

Algorithm	Strength
KLL	General purpose
REQ	Optimised for high quantiles (99th, 99.9th)
DDSketch	Constant relative error across all quantiles
t-Digest	Highly accurate at the tails

thanos vassilakis thanos

Set Data Sketches (Advanced Operations)

Frequency Data Sketches (Point Queries & Top-K)