- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
Slightly disorganized but reasonably complete notes on the algorithms, strategies and optimizations of the Akka Cluster implementation. Could use a lot more links and context etc., but was just written for my own understanding. Might be expanded later.
Links to papers and talks that have inspired the implementation can be found on the 10 last pages of this presentation.
This is the Gossip state representation:
{-# LANGUAGE TypeFamilies #-} | |
import Data.Function (on) | |
import Control.Applicative | |
data EventData e = EventData { | |
eventId :: Int, | |
body :: Event e | |
} |
There are lots of representations for strings. In most languages they pick one set of tradeoffs and run with it. In haskell the "default" implementation (at least the one in the prelude) is a pretty bad choice, but unlike most other languages (really) good implementations exist for pretty much every way you can twist these things. This can be a good thing, but it also leads to confusion, and frustration to find the right types and how to convert them.
#!/usr/bin/env python | |
from optparse import OptionParser | |
from brod.zk import * | |
import pickle | |
import struct | |
import socket | |
import sys | |
import time | |