Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Created October 22, 2025 16:32
Show Gist options
  • Save bigsnarfdude/56c5b67a0677628ad08f2e6d3f788f93 to your computer and use it in GitHub Desktop.
Save bigsnarfdude/56c5b67a0677628ad08f2e6d3f788f93 to your computer and use it in GitHub Desktop.
algesnake examples
🐍 Algesnake - 39 Working Examples
basic_example.py (6 examples)
Foundation concepts using abstract algebra:
- Example 1: Max Monoid - Finding maximum values
- Example 2: Integer Addition Group - Addition with subtraction
- Example 3: Ring Operations - Distributivity (a Γ— (b + c))
- Example 4: Complex Ring Expressions - Combined operations
- Example 5: MonoidWrapper - Quick prototyping without classes
- Example 6: Algebraic Law Verification - Testing monoid laws
monoid_examples.py (7 examples)
Real-world monoid use cases:
- Numeric examples - Add, Multiply, Max, Min
- Collection examples - Sets, Lists, Maps, Strings
- Option examples - Safe null handling with Some/None
- Distributed aggregation - Parallel processing across partitions
- Word count - MapReduce-style counting
- Statistics - Computing count/sum/max/min in one pass
- Config fallback - Chaining configuration sources
hyperloglog_bloom_examples.py (10 examples)
Cardinality & membership testing:
- HyperLogLog (4):
- Basics - Counting unique users
- Streaming - Real-time unique count
- Distributed - Merging across servers
- Sum() integration - Using Python's sum()
- Bloom Filter (3):
- Basics - Spam email blocking
- Web crawler - Duplicate URL detection
- Distributed - Global spam blocklist
- Combined (3):
- Analytics pipeline - Both together
- Memory comparison - vs traditional sets
- Convenience functions
countmin_topk_examples.py (12 examples)
Frequency estimation & heavy hitters:
- CountMinSketch (5):
- Basics - Word frequency counting
- Error rate config - Setting accuracy bounds
- Word counting - Large corpus processing
- Distributed - Log analysis across servers
- Heavy hitters - Finding DDoS attackers
- TopK (5):
- Basics - Top 5 items tracking
- Trending - Real-time hashtag trends
- Streaming - Batch processing
- Distributed - Page views across servers
- Error analysis - Top errors to fix
- Combined (2):
- CMS + TopK analytics
- Memory comparison
tdigest_examples.py (10 examples)
Percentile/quantile estimation:
- Basics - p50, p95, p99 percentiles
- Latency monitoring - API response times
- Distributed aggregation - Across multiple servers
- Streaming percentiles - Real-time updates
- CDF analysis - Cumulative distribution
- SLA monitoring - Dashboard with multiple endpoints
- Error vs Success - Comparing distributions
- Quantile queries - Database query times
- Memory efficiency - vs storing all values
- Convenience functions
πŸ“Š What These Demonstrate
Core Concept: Everything is a monoid (has combine + identity)
- Allows distributed aggregation (merge results from multiple machines)
- Associative (can combine in any order β†’ parallel processing)
- Identity element (empty/zero value that doesn't change results)
Probabilistic Structures trade exact answers for:
- πŸ”₯ Massive memory savings (96-99% less memory)
- ⚑ Constant time operations (O(1) regardless of data size)
- 🌐 Perfect for distributed systems (merge sketches from 1000s of servers)
🎯 Real Use Cases Shown
- Counting unique users (billions of events β†’ 16KB memory)
- Spam/malware detection (instant lookups)
- Finding trending topics (top K hashtags)
- Monitoring API latency (p99 without storing all requests)
- DDoS detection (heavy hitter IPs)
- Web crawler deduplication
- Distributed log analysis
Total: 39 runnable examples, all tested and working βœ…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment