- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
git ls-tree -r --name-only HEAD | while read filename; do | |
echo "$(git log -1 --format="%at | %h | %an | %ad |" -- $filename) $filename" | |
done |
A friend asked me for a few pointers to interesting, mostly recent papers on data warehousing and "big data" database systems, with an eye towards real-world deployments. I figured I'd share the list. It's biased and rather incomplete but maybe of interest to someone. While many are obvious choices (I've omitted several, like MapReduce), I think there are a few underappreciated gems.
###Dataflow Engines:
Dryad--general-purpose distributed parallel dataflow engine
http://research.microsoft.com/en-us/projects/dryad/eurosys07.pdf
Spark--in memory dataflow
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
In this document I am using Sass's SCSS syntax. You can choose to use the indented syntax in sass, if you prefer it, it has no functional differences from the SCSS syntax.
For Less, I'm using the JavaScript version because this is what they suggest on the website. The ruby version may be different.