Skip to content

Instantly share code, notes, and snippets.

Revisions

  1. @debasishg debasishg revised this gist Sep 16, 2015. 1 changed file with 7 additions and 4 deletions.
    11 changes: 7 additions & 4 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -48,27 +48,30 @@
    12. [A Framework for Clustering Massive-Domain Data Streams](http://charuaggarwal.net/cskrevise.pdf) by Charu C. Aggarwal

    13. [A framework for clustering evolving data streams](http://dl.acm.org/citation.cfm?id=1315460) by Charu C. Aggarwal et. al.
    14. *Presentations*

    14. [Unsupervised Feature Selection on Data Streams](https://www0.bnl.gov/isd/documents/89130.pdf) by Hao Huang

    15. *Presentations*
    * [Spark Streaming Use Cases](http://www.slideshare.net/pacoid/spark-streaming-case-studies) by Paco Nathan
    * [Tiny Batches in the wine, Shiny new bits in Spark Streaming](http://www.slideshare.net/pacoid/tiny-batches-in-the-wine-shiny-new-bits-in-spark-streaming) by Paco Nathan
    * [Real time Data Analysis Patterns](http://de.slideshare.net/mikiobraun/realtime-data-analysis-patterns) by Mikio Braun
    * [Streaming Big Data with Apache Spark, Kafka and Cassandra](http://www.slideshare.net/helenaedelson/streaming-big-data-with-apache-spark-apache-kafka-and-apache-cassandra-delivering-meaning-in-nearreal-time-at-high-velocity-at-massive-scale) by Helena Edelson
    * [Streaming Data Analysis and Online Learning](https://www.hakkalabs.co/articles/streaming-data-analysis-and-online-learning/) by John Myles White
    * [Algebra for Analytics](https://speakerdeck.com/johnynek/algebra-for-analytics) by Oscar Boykin @posco

    15. *Courses*
    16. *Courses*
    * [Alex Smola course at Berkeley SML: Data Streams](http://alex.smola.org/teaching/berkeley2012/streams.html)
    * [Piotr Indyk course at MIT Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
    * [Andrew McGregor course at UMass on Advanced Algorithms](http://people.cs.umass.edu/~mcgregor/courses/CS711S12/index.html)
    * [Amit Chakrabarti course at Dartmouth on Data Stream Algorithms](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/) and the entire course notes in [a single document](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/Notes/lecnotes.pdf)

    16. *Incremental Learning with Decision Trees for Streamed Data*
    17. *Incremental Learning with Decision Trees for Streamed Data*
    * [Mining High-Speed Data Streams (Hoeffding Trees)](http://homes.cs.washington.edu/~pedrod/papers/kdd00.pdf) by Pedro Domingos and Geoff Hulten
    * [Mining Time Changing Data Streams](http://homes.cs.washington.edu/~pedrod/papers/kdd01b.pdf) by G. Hulten, L. Spencer, and P. Domingos.
    * [Comprehensive study on techniques of Incremental learning with decision trees for streamed data](http://www.ijeat.org/attachments/File/V1Issue3/C0201021312.pdf) by Prerana Gupta, Amit Thakkar, Amit Ganatra
    * [Use of Hoeffding trees in concept based data stream mining](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4544780) by Hoeglinger, S. and Pears, R.

    17. *Clustering Data Streams*
    18. *Clustering Data Streams*
    * [Clustering Data Streams: Theory and Practice](http://infolab.stanford.edu/~loc/tkdepaper.pdf) by Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani and Liadan O’Callaghan
    * [Online clustering of data streams](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.83.728) by J. Beringer, E. Hüllermeier
    * [Conquering the divide: Continuous clustering of distributed data streams (2007)](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.5316) by Graham Cormode
  2. @debasishg debasishg revised this gist Sep 16, 2015. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -34,6 +34,7 @@
    * [Sketching data structures](http://lkozma.net/blog/sketching-data-structures/) - a good overview of Bloom Filters and Count Min Sketch
    * [Sketching can improve linear regression](https://speakerdeck.com/timonk/sketching-as-a-tool-for-numerical-linear-algebra) and the [talk](http://www.youtube.com/watch?v=-7S9jNeY_R0) by David
    * [A Framework for Clustering Massive-Domain Data Streams](http://charuaggarwal.net/cskrevise.pdf) by Charu Aggarwal
    * [Streaming Anomaly Detection Using Randomized Matrix Sketching](http://www.cse.psu.edu/~kasivisw/randsketch.pdf) by Huang & Kasiviswanathan

    8. *Surveys*
    * [References for Data Stream Algorithms](http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf) by Graham Cormode : an exhaustive set of references with explanations
  3. @debasishg debasishg revised this gist Sep 10, 2015. 1 changed file with 10 additions and 1 deletion.
    11 changes: 10 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -22,6 +22,7 @@
    6. *Implementations*
    * [stream-lib](https://github.com/addthis/stream-lib) : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters
    * [Algebird from Twitter](https://github.com/twitter/algebird)
    * [streamDM - Data Mining for Spark Streaming](http://huawei-noah.github.io/streamDM/)

    7. *Count-Min Sketch*
    * [An Improved Data Stream Summary: The Count-Min Sketch and its Applications](http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf) - Cormode & Muthukrishnan : The paper that introduced count min sketch
    @@ -62,5 +63,13 @@

    16. *Incremental Learning with Decision Trees for Streamed Data*
    * [Mining High-Speed Data Streams (Hoeffding Trees)](http://homes.cs.washington.edu/~pedrod/papers/kdd00.pdf) by Pedro Domingos and Geoff Hulten
    * [Mining Time Changing Data Streams](http://homes.cs.washington.edu/~pedrod/papers/kdd01b.pdf) by G. Hulten, L. Spencer, and P. Domingos.
    * [Comprehensive study on techniques of Incremental learning with decision trees for streamed data](http://www.ijeat.org/attachments/File/V1Issue3/C0201021312.pdf) by Prerana Gupta, Amit Thakkar, Amit Ganatra
    * [Use of Hoeffding trees in concept based data stream mining](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4544780) by Hoeglinger, S. and Pears, R.
    * [Use of Hoeffding trees in concept based data stream mining](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4544780) by Hoeglinger, S. and Pears, R.

    17. *Clustering Data Streams*
    * [Clustering Data Streams: Theory and Practice](http://infolab.stanford.edu/~loc/tkdepaper.pdf) by Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani and Liadan O’Callaghan
    * [Online clustering of data streams](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.83.728) by J. Beringer, E. Hüllermeier
    * [Conquering the divide: Continuous clustering of distributed data streams (2007)](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.5316) by Graham Cormode
    * [Clustering on Streams](http://www.cs.utah.edu/~suresh/papers/enc-db/entry.pdf) by Suresh Venkat

  4. @debasishg debasishg revised this gist Sep 8, 2015. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -53,11 +53,13 @@
    * [Streaming Big Data with Apache Spark, Kafka and Cassandra](http://www.slideshare.net/helenaedelson/streaming-big-data-with-apache-spark-apache-kafka-and-apache-cassandra-delivering-meaning-in-nearreal-time-at-high-velocity-at-massive-scale) by Helena Edelson
    * [Streaming Data Analysis and Online Learning](https://www.hakkalabs.co/articles/streaming-data-analysis-and-online-learning/) by John Myles White
    * [Algebra for Analytics](https://speakerdeck.com/johnynek/algebra-for-analytics) by Oscar Boykin @posco

    15. *Courses*
    * [Alex Smola course at Berkeley SML: Data Streams](http://alex.smola.org/teaching/berkeley2012/streams.html)
    * [Piotr Indyk course at MIT Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
    * [Andrew McGregor course at UMass on Advanced Algorithms](http://people.cs.umass.edu/~mcgregor/courses/CS711S12/index.html)
    * [Amit Chakrabarti course at Dartmouth on Data Stream Algorithms](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/) and the entire course notes in [a single document](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/Notes/lecnotes.pdf)

    16. *Incremental Learning with Decision Trees for Streamed Data*
    * [Mining High-Speed Data Streams (Hoeffding Trees)](http://homes.cs.washington.edu/~pedrod/papers/kdd00.pdf) by Pedro Domingos and Geoff Hulten
    * [Comprehensive study on techniques of Incremental learning with decision trees for streamed data](http://www.ijeat.org/attachments/File/V1Issue3/C0201021312.pdf) by Prerana Gupta, Amit Thakkar, Amit Ganatra
  5. @debasishg debasishg revised this gist Sep 8, 2015. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -57,4 +57,8 @@
    * [Alex Smola course at Berkeley SML: Data Streams](http://alex.smola.org/teaching/berkeley2012/streams.html)
    * [Piotr Indyk course at MIT Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
    * [Andrew McGregor course at UMass on Advanced Algorithms](http://people.cs.umass.edu/~mcgregor/courses/CS711S12/index.html)
    * [Amit Chakrabarti course at Dartmouth on Data Stream Algorithms](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/) and the entire course notes in [a single document](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/Notes/lecnotes.pdf)
    * [Amit Chakrabarti course at Dartmouth on Data Stream Algorithms](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/) and the entire course notes in [a single document](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/Notes/lecnotes.pdf)
    16. *Incremental Learning with Decision Trees for Streamed Data*
    * [Mining High-Speed Data Streams (Hoeffding Trees)](http://homes.cs.washington.edu/~pedrod/papers/kdd00.pdf) by Pedro Domingos and Geoff Hulten
    * [Comprehensive study on techniques of Incremental learning with decision trees for streamed data](http://www.ijeat.org/attachments/File/V1Issue3/C0201021312.pdf) by Prerana Gupta, Amit Thakkar, Amit Ganatra
    * [Use of Hoeffding trees in concept based data stream mining](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4544780) by Hoeglinger, S. and Pears, R.
  6. @debasishg debasishg revised this gist May 8, 2015. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -32,6 +32,7 @@
    * [Sketch Techniques for Approximate Query Processing](http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf) by Cormode
    * [Sketching data structures](http://lkozma.net/blog/sketching-data-structures/) - a good overview of Bloom Filters and Count Min Sketch
    * [Sketching can improve linear regression](https://speakerdeck.com/timonk/sketching-as-a-tool-for-numerical-linear-algebra) and the [talk](http://www.youtube.com/watch?v=-7S9jNeY_R0) by David
    * [A Framework for Clustering Massive-Domain Data Streams](http://charuaggarwal.net/cskrevise.pdf) by Charu Aggarwal

    8. *Surveys*
    * [References for Data Stream Algorithms](http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf) by Graham Cormode : an exhaustive set of references with explanations
  7. @debasishg debasishg revised this gist Mar 9, 2015. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -5,6 +5,7 @@
    * [Approximate Frequency Counts over Data Streams](http://www.vldb.org/conf/2002/S10P03.pdf) by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
    * [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf) by Graham Cormode & Marios Hadjieleftheriou
    * [The space complexity of approximating the frequency moments](http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf) by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments
    * [Cuckoo Filter: Practically Better Than Bloom](http://www.pdl.cmu.edu/PDL-FTP/FS/cuckoo-conext2014.pdf) by Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher

    2. [Hyperloglog and MinHash](http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html) : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint"

  8. @debasishg debasishg revised this gist Mar 9, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -55,4 +55,4 @@
    * [Alex Smola course at Berkeley SML: Data Streams](http://alex.smola.org/teaching/berkeley2012/streams.html)
    * [Piotr Indyk course at MIT Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
    * [Andrew McGregor course at UMass on Advanced Algorithms](http://people.cs.umass.edu/~mcgregor/courses/CS711S12/index.html)
    * [Amit Chakrabarti course at Dartmouth on Data Stream Algorithms](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/)
    * [Amit Chakrabarti course at Dartmouth on Data Stream Algorithms](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/) and the entire course notes in [a single document](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/Notes/lecnotes.pdf)
  9. @debasishg debasishg revised this gist Mar 8, 2015. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -50,6 +50,7 @@
    * [Real time Data Analysis Patterns](http://de.slideshare.net/mikiobraun/realtime-data-analysis-patterns) by Mikio Braun
    * [Streaming Big Data with Apache Spark, Kafka and Cassandra](http://www.slideshare.net/helenaedelson/streaming-big-data-with-apache-spark-apache-kafka-and-apache-cassandra-delivering-meaning-in-nearreal-time-at-high-velocity-at-massive-scale) by Helena Edelson
    * [Streaming Data Analysis and Online Learning](https://www.hakkalabs.co/articles/streaming-data-analysis-and-online-learning/) by John Myles White
    * [Algebra for Analytics](https://speakerdeck.com/johnynek/algebra-for-analytics) by Oscar Boykin @posco
    15. *Courses*
    * [Alex Smola course at Berkeley SML: Data Streams](http://alex.smola.org/teaching/berkeley2012/streams.html)
    * [Piotr Indyk course at MIT Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
  10. @debasishg debasishg revised this gist Dec 30, 2014. 1 changed file with 3 additions and 2 deletions.
    5 changes: 3 additions & 2 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -52,5 +52,6 @@
    * [Streaming Data Analysis and Online Learning](https://www.hakkalabs.co/articles/streaming-data-analysis-and-online-learning/) by John Myles White
    15. *Courses*
    * [Alex Smola course at Berkeley SML: Data Streams](http://alex.smola.org/teaching/berkeley2012/streams.html)
    * [Piotr Indyk course Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
    * [Andrew McGregor course on Algorithms](http://people.cs.umass.edu/~mcgregor/courses/CS711S12/index.html)
    * [Piotr Indyk course at MIT Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
    * [Andrew McGregor course at UMass on Advanced Algorithms](http://people.cs.umass.edu/~mcgregor/courses/CS711S12/index.html)
    * [Amit Chakrabarti course at Dartmouth on Data Stream Algorithms](http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/)
  11. @debasishg debasishg revised this gist Dec 30, 2014. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -49,6 +49,7 @@
    * [Tiny Batches in the wine, Shiny new bits in Spark Streaming](http://www.slideshare.net/pacoid/tiny-batches-in-the-wine-shiny-new-bits-in-spark-streaming) by Paco Nathan
    * [Real time Data Analysis Patterns](http://de.slideshare.net/mikiobraun/realtime-data-analysis-patterns) by Mikio Braun
    * [Streaming Big Data with Apache Spark, Kafka and Cassandra](http://www.slideshare.net/helenaedelson/streaming-big-data-with-apache-spark-apache-kafka-and-apache-cassandra-delivering-meaning-in-nearreal-time-at-high-velocity-at-massive-scale) by Helena Edelson
    * [Streaming Data Analysis and Online Learning](https://www.hakkalabs.co/articles/streaming-data-analysis-and-online-learning/) by John Myles White
    15. *Courses*
    * [Alex Smola course at Berkeley SML: Data Streams](http://alex.smola.org/teaching/berkeley2012/streams.html)
    * [Piotr Indyk course Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
  12. @debasishg debasishg revised this gist Dec 30, 2014. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -48,4 +48,8 @@
    * [Spark Streaming Use Cases](http://www.slideshare.net/pacoid/spark-streaming-case-studies) by Paco Nathan
    * [Tiny Batches in the wine, Shiny new bits in Spark Streaming](http://www.slideshare.net/pacoid/tiny-batches-in-the-wine-shiny-new-bits-in-spark-streaming) by Paco Nathan
    * [Real time Data Analysis Patterns](http://de.slideshare.net/mikiobraun/realtime-data-analysis-patterns) by Mikio Braun
    * [Streaming Big Data with Apache Spark, Kafka and Cassandra](http://www.slideshare.net/helenaedelson/streaming-big-data-with-apache-spark-apache-kafka-and-apache-cassandra-delivering-meaning-in-nearreal-time-at-high-velocity-at-massive-scale) by Helena Edelson
    * [Streaming Big Data with Apache Spark, Kafka and Cassandra](http://www.slideshare.net/helenaedelson/streaming-big-data-with-apache-spark-apache-kafka-and-apache-cassandra-delivering-meaning-in-nearreal-time-at-high-velocity-at-massive-scale) by Helena Edelson
    15. *Courses*
    * [Alex Smola course at Berkeley SML: Data Streams](http://alex.smola.org/teaching/berkeley2012/streams.html)
    * [Piotr Indyk course Sketching, Streaming and Sub-linear Space Algorithms](http://stellar.mit.edu/S/course/6/fa07/6.895/)
    * [Andrew McGregor course on Algorithms](http://people.cs.umass.edu/~mcgregor/courses/CS711S12/index.html)
  13. @debasishg debasishg revised this gist Dec 11, 2014. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -46,5 +46,6 @@
    13. [A framework for clustering evolving data streams](http://dl.acm.org/citation.cfm?id=1315460) by Charu C. Aggarwal et. al.
    14. *Presentations*
    * [Spark Streaming Use Cases](http://www.slideshare.net/pacoid/spark-streaming-case-studies) by Paco Nathan
    * [Tiny Batches in the wine, Shiny new bits in Spark Streaming](http://www.slideshare.net/pacoid/tiny-batches-in-the-wine-shiny-new-bits-in-spark-streaming) by Paco Nathan
    * [Real time Data Analysis Patterns](http://de.slideshare.net/mikiobraun/realtime-data-analysis-patterns) by Mikio Braun
    * [Streaming Big Data with Apache Spark, Kafka and Cassandra](http://www.slideshare.net/helenaedelson/streaming-big-data-with-apache-spark-apache-kafka-and-apache-cassandra-delivering-meaning-in-nearreal-time-at-high-velocity-at-massive-scale) by Helena Edelson
  14. @debasishg debasishg revised this gist Nov 30, 2014. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -43,4 +43,8 @@

    12. [A Framework for Clustering Massive-Domain Data Streams](http://charuaggarwal.net/cskrevise.pdf) by Charu C. Aggarwal

    13. [A framework for clustering evolving data streams](http://dl.acm.org/citation.cfm?id=1315460) by Charu C. Aggarwal et. al.
    13. [A framework for clustering evolving data streams](http://dl.acm.org/citation.cfm?id=1315460) by Charu C. Aggarwal et. al.
    14. *Presentations*
    * [Spark Streaming Use Cases](http://www.slideshare.net/pacoid/spark-streaming-case-studies) by Paco Nathan
    * [Real time Data Analysis Patterns](http://de.slideshare.net/mikiobraun/realtime-data-analysis-patterns) by Mikio Braun
    * [Streaming Big Data with Apache Spark, Kafka and Cassandra](http://www.slideshare.net/helenaedelson/streaming-big-data-with-apache-spark-apache-kafka-and-apache-cassandra-delivering-meaning-in-nearreal-time-at-high-velocity-at-massive-scale) by Helena Edelson
  15. @debasishg debasishg revised this gist Jun 12, 2014. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -38,5 +38,9 @@
    * [Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches](http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004) by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine . Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.

    10. [Distributed Streams Algorithms for Sliding Windows](http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf) by Phillip B. Gibbons and Srikanta Tirthapura

    11. [Frugal Streaming](http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/)
    12. [A Framework for Clustering Massive-Domain Data Streams](http://charuaggarwal.net/cskrevise.pdf)

    12. [A Framework for Clustering Massive-Domain Data Streams](http://charuaggarwal.net/cskrevise.pdf) by Charu C. Aggarwal

    13. [A framework for clustering evolving data streams](http://dl.acm.org/citation.cfm?id=1315460) by Charu C. Aggarwal et. al.
  16. @debasishg debasishg revised this gist Jun 12, 2014. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -38,4 +38,5 @@
    * [Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches](http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004) by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine . Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.

    10. [Distributed Streams Algorithms for Sliding Windows](http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf) by Phillip B. Gibbons and Srikanta Tirthapura
    11. [Frugal Streaming](http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/)
    11. [Frugal Streaming](http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/)
    12. [A Framework for Clustering Massive-Domain Data Streams](http://charuaggarwal.net/cskrevise.pdf)
  17. @debasishg debasishg revised this gist Apr 23, 2014. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -37,4 +37,5 @@
    * [Data Streams - Algorithms and Applications](http://www.amazon.com/Data-Streams-Applications-Foundations-Theoretical/dp/193301914X) by S. Muthukrishnan : This is an excellent monograph with surveys of all algorithms related to data streams. Also a free copy of the book is available from Muthu's web site at http://www.cs.rutgers.edu/~muthu/
    * [Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches](http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004) by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine . Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.

    10. [Distributed Streams Algorithms for Sliding Windows](http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf) by Phillip B. Gibbons and Srikanta Tirthapura
    10. [Distributed Streams Algorithms for Sliding Windows](http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf) by Phillip B. Gibbons and Srikanta Tirthapura
    11. [Frugal Streaming](http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/)
  18. @debasishg debasishg revised this gist Jan 1, 2014. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,7 @@
    1. *General Background and Overview*
    * [Probabilistic Data Structures for Web Analytics and Data Mining](http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    * [Models and Issues in Data Stream Systems](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
    * [Philippe Flajolet’s contribution to streaming algorithms](https://speakerdeck.com/timonk/philippe-flajolets-contribution-to-streaming-algorithms) : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
    * [Approximate Frequency Counts over Data Streams](http://www.vldb.org/conf/2002/S10P03.pdf) by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
    * [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf) by Graham Cormode & Marios Hadjieleftheriou
    * [The space complexity of approximating the frequency moments](http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf) by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments
  19. @debasishg debasishg revised this gist Dec 31, 2013. No changes.
  20. @debasishg debasishg revised this gist Dec 31, 2013. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -28,7 +28,8 @@
    * [Streaming Algorithms and Sketches](http://blog.aggregateknowledge.com/2011/09/13/streaming-algorithms-and-sketches/) - Count Min Sketch on AK Tech Blog
    * Muthukrishnan [talking](http://www.youtube.com/watch?v=OOZC4KCErN0) on Count Min Sketch at AK Tech conference
    * [Sketch Techniques for Approximate Query Processing](http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf) by Cormode
    * [Sketching data structures](http://lkozma.net/blog/sketching-data-structures/) - a good overview of Bloom Filters and Count Min Sketch :
    * [Sketching data structures](http://lkozma.net/blog/sketching-data-structures/) - a good overview of Bloom Filters and Count Min Sketch
    * [Sketching can improve linear regression](https://speakerdeck.com/timonk/sketching-as-a-tool-for-numerical-linear-algebra) and the [talk](http://www.youtube.com/watch?v=-7S9jNeY_R0) by David

    8. *Surveys*
    * [References for Data Stream Algorithms](http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf) by Graham Cormode : an exhaustive set of references with explanations
  21. @debasishg debasishg revised this gist Dec 31, 2013. 1 changed file with 20 additions and 20 deletions.
    40 changes: 20 additions & 20 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -5,34 +5,34 @@
    * [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf) by Graham Cormode & Marios Hadjieleftheriou
    * [The space complexity of approximating the frequency moments](http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf) by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments

    2. *Hyperloglog and MinHash* : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)
    2. [Hyperloglog and MinHash](http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html) : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint"

    3. *Streaming/Sketching Conference from AK Tech* : Contains links to videos and slides from the speakers like Muthukrishnan who spoke about Count Min Sketch (http://blog.aggregateknowledge.com/2013/05/23/foundation-capital-and-aggregate-knowledge-sponsor-streamingsketching-conference/)
    3. [Streaming/Sketching Conference from AK Tech](http://blog.aggregateknowledge.com/2013/05/23/foundation-capital-and-aggregate-knowledge-sponsor-streamingsketching-conference/) : Contains links to videos and slides from the speakers like Muthukrishnan who spoke about Count Min Sketch

    4. *Q-digest*
    * Medians and Beyond: New Aggregation Techniques for Sensor Networks : The paper that introduced q-digest for range queries and quantile approximation (http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf)
    * Blog post (http://papercruncher.com/2011/07/31/q-digest/)
    * Blog post (http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/)
    * The Art of Approximating Distributions: Histograms and Quantiles at Scale - an alternative approach to q-digest (http://metamarkets.com/2013/histograms/#)
    * [Medians and Beyond: New Aggregation Techniques for Sensor Networks](http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf) : The paper that introduced q-digest for range queries and quantile approximation
    * [Blog post on q-digest](http://papercruncher.com/2011/07/31/q-digest/)
    * [Blog post on approximate quantiles](http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/)
    * [The Art of Approximating Distributions](http://metamarkets.com/2013/histograms/#) : Histograms and Quantiles at Scale - an alternative approach to q-digest

    5. *t-digest* : A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. Ted Dunning's variant of Q-digest that does some improvements (https://github.com/tdunning/t-digest)
    5. [t-digest](https://github.com/tdunning/t-digest) : A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. Ted Dunning's variant of Q-digest that does some improvements

    6. *Implementations*
    * stream-lib : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters (https://github.com/addthis/stream-lib)
    * Algebird from Twitter : (https://github.com/twitter/algebird)
    * [stream-lib](https://github.com/addthis/stream-lib) : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters
    * [Algebird from Twitter](https://github.com/twitter/algebird)

    7. *Count-Min Sketch*
    * An Improved Data Stream Summary: The Count-Min Sketch and its Applications - Cormode & Muthukrishnan : The paper that introduced count min sketch (http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf)
    * collection of information on Count Min Sketch (https://sites.google.com/site/countminsketch/)
    * Count Min Sketch by Cormode : Introductory paper (http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf)
    * Streaming Algorithms and Sketches - Count Min Sketch on AK Tech Blog (http://blog.aggregateknowledge.com/2011/09/13/streaming-algorithms-and-sketches/)
    * Muthukrishnan talking on Count Min Sketch at AK Tech conference (http://www.youtube.com/watch?v=OOZC4KCErN0)
    * Sketch Techniques for Approximate Query Processing by Cormode (http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf)
    * Sketching data structures - a good overview of Bloom Filters and Count Min Sketch : (http://lkozma.net/blog/sketching-data-structures/)
    * [An Improved Data Stream Summary: The Count-Min Sketch and its Applications](http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf) - Cormode & Muthukrishnan : The paper that introduced count min sketch
    * [Collection of information on Count Min Sketch](https://sites.google.com/site/countminsketch/)
    * [Count Min Sketch](http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf) by Cormode : Introductory paper
    * [Streaming Algorithms and Sketches](http://blog.aggregateknowledge.com/2011/09/13/streaming-algorithms-and-sketches/) - Count Min Sketch on AK Tech Blog
    * Muthukrishnan [talking](http://www.youtube.com/watch?v=OOZC4KCErN0) on Count Min Sketch at AK Tech conference
    * [Sketch Techniques for Approximate Query Processing](http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf) by Cormode
    * [Sketching data structures](http://lkozma.net/blog/sketching-data-structures/) - a good overview of Bloom Filters and Count Min Sketch :

    8. *Surveys*
    * References for Data Stream Algorithms by Graham Cormode : an exhaustive set of references with explanations (http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf)
    * Data Streams - Algorithms and Applications by s. Muthukrishnan (http://www.amazon.com/Data-Streams-Applications-Foundations-Theoretical/dp/193301914X) : This is an excellent monograph with surveys of all algorithms related to data streams. Also a free copy of the book is available from Muthu's web site at http://www.cs.rutgers.edu/~muthu/
    * Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine (http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004). Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.
    * [References for Data Stream Algorithms](http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf) by Graham Cormode : an exhaustive set of references with explanations
    * [Data Streams - Algorithms and Applications](http://www.amazon.com/Data-Streams-Applications-Foundations-Theoretical/dp/193301914X) by S. Muthukrishnan : This is an excellent monograph with surveys of all algorithms related to data streams. Also a free copy of the book is available from Muthu's web site at http://www.cs.rutgers.edu/~muthu/
    * [Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches](http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004) by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine . Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.

    10. *Distributed Streams Algorithms for Sliding Windows* by Phillip B. Gibbons and Srikanta Tirthapura (http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf)
    10. [Distributed Streams Algorithms for Sliding Windows](http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf) by Phillip B. Gibbons and Srikanta Tirthapura
  22. @debasishg debasishg revised this gist Dec 31, 2013. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,9 @@
    1. *General Background and Overview*
    * [Probabilistic Data Structures for Web Analytics and Data Mining](http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : On Highly Scalable Blog : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    * Models and Issues in Data Stream Systems : (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
    * Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
    * Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
    * The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments (http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf)
    * [Probabilistic Data Structures for Web Analytics and Data Mining](http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    * [Models and Issues in Data Stream Systems](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
    * [Approximate Frequency Counts over Data Streams](http://www.vldb.org/conf/2002/S10P03.pdf) by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
    * [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf) by Graham Cormode & Marios Hadjieleftheriou
    * [The space complexity of approximating the frequency moments](http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf) by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments

    2. *Hyperloglog and MinHash* : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)

  23. @debasishg debasishg revised this gist Dec 31, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,5 @@
    1. *General Background and Overview*
    * Probabilistic Data Structures for Web Analytics and Data Mining : On Highly Scalable Blog (http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    * [Probabilistic Data Structures for Web Analytics and Data Mining](http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : On Highly Scalable Blog : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    * Models and Issues in Data Stream Systems : (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
    * Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
    * Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
  24. @debasishg debasishg revised this gist Dec 31, 2013. 1 changed file with 9 additions and 9 deletions.
    18 changes: 9 additions & 9 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -1,27 +1,27 @@
    1. General Background and Overview
    1. *General Background and Overview*
    * Probabilistic Data Structures for Web Analytics and Data Mining : On Highly Scalable Blog (http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    * Models and Issues in Data Stream Systems : (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
    * Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
    * Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
    * The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments (http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf)

    2. Hyperloglog and MinHash : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)
    2. *Hyperloglog and MinHash* : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)

    3. Streaming/Sketching Conference from AK Tech : Contains links to videos and slides from the speakers like Muthukrishnan who spoke about Count Min Sketch (http://blog.aggregateknowledge.com/2013/05/23/foundation-capital-and-aggregate-knowledge-sponsor-streamingsketching-conference/)
    3. *Streaming/Sketching Conference from AK Tech* : Contains links to videos and slides from the speakers like Muthukrishnan who spoke about Count Min Sketch (http://blog.aggregateknowledge.com/2013/05/23/foundation-capital-and-aggregate-knowledge-sponsor-streamingsketching-conference/)

    4. Q-digest
    4. *Q-digest*
    * Medians and Beyond: New Aggregation Techniques for Sensor Networks : The paper that introduced q-digest for range queries and quantile approximation (http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf)
    * Blog post (http://papercruncher.com/2011/07/31/q-digest/)
    * Blog post (http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/)
    * The Art of Approximating Distributions: Histograms and Quantiles at Scale - an alternative approach to q-digest (http://metamarkets.com/2013/histograms/#)

    5. t-digest : A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. Ted Dunning's variant of Q-digest that does some improvements (https://github.com/tdunning/t-digest)
    5. *t-digest* : A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. Ted Dunning's variant of Q-digest that does some improvements (https://github.com/tdunning/t-digest)

    6. Implementations
    6. *Implementations*
    * stream-lib : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters (https://github.com/addthis/stream-lib)
    * Algebird from Twitter : (https://github.com/twitter/algebird)

    7. Count-Min Sketch
    7. *Count-Min Sketch*
    * An Improved Data Stream Summary: The Count-Min Sketch and its Applications - Cormode & Muthukrishnan : The paper that introduced count min sketch (http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf)
    * collection of information on Count Min Sketch (https://sites.google.com/site/countminsketch/)
    * Count Min Sketch by Cormode : Introductory paper (http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf)
    @@ -30,9 +30,9 @@
    * Sketch Techniques for Approximate Query Processing by Cormode (http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf)
    * Sketching data structures - a good overview of Bloom Filters and Count Min Sketch : (http://lkozma.net/blog/sketching-data-structures/)

    8. Surveys
    8. *Surveys*
    * References for Data Stream Algorithms by Graham Cormode : an exhaustive set of references with explanations (http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf)
    * Data Streams - Algorithms and Applications by s. Muthukrishnan (http://www.amazon.com/Data-Streams-Applications-Foundations-Theoretical/dp/193301914X) : This is an excellent monograph with surveys of all algorithms related to data streams. Also a free copy of the book is available from Muthu's web site at http://www.cs.rutgers.edu/~muthu/
    * Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine (http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004). Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.

    10. Distributed Streams Algorithms for Sliding Windows by Phillip B. Gibbons and Srikanta Tirthapura (http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf)
    10. *Distributed Streams Algorithms for Sliding Windows* by Phillip B. Gibbons and Srikanta Tirthapura (http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf)
  25. @debasishg debasishg revised this gist Dec 31, 2013. 1 changed file with 16 additions and 16 deletions.
    32 changes: 16 additions & 16 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -10,29 +10,29 @@
    3. Streaming/Sketching Conference from AK Tech : Contains links to videos and slides from the speakers like Muthukrishnan who spoke about Count Min Sketch (http://blog.aggregateknowledge.com/2013/05/23/foundation-capital-and-aggregate-knowledge-sponsor-streamingsketching-conference/)

    4. Q-digest
    a) Medians and Beyond: New Aggregation Techniques for Sensor Networks : The paper that introduced q-digest for range queries and quantile approximation (http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf)
    b) Blog post (http://papercruncher.com/2011/07/31/q-digest/)
    c) Blog post (http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/)
    d) The Art of Approximating Distributions: Histograms and Quantiles at Scale - an alternative approach to q-digest (http://metamarkets.com/2013/histograms/#)
    * Medians and Beyond: New Aggregation Techniques for Sensor Networks : The paper that introduced q-digest for range queries and quantile approximation (http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf)
    * Blog post (http://papercruncher.com/2011/07/31/q-digest/)
    * Blog post (http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/)
    * The Art of Approximating Distributions: Histograms and Quantiles at Scale - an alternative approach to q-digest (http://metamarkets.com/2013/histograms/#)

    5. t-digest : A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. Ted Dunning's variant of Q-digest that does some improvements (https://github.com/tdunning/t-digest)

    6. Implementations
    a) stream-lib : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters (https://github.com/addthis/stream-lib)
    b) Algebird from Twitter : (https://github.com/twitter/algebird)
    * stream-lib : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters (https://github.com/addthis/stream-lib)
    * Algebird from Twitter : (https://github.com/twitter/algebird)

    7. Count-Min Sketch
    a) An Improved Data Stream Summary: The Count-Min Sketch and its Applications - Cormode & Muthukrishnan : The paper that introduced count min sketch (http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf)
    b) collection of information on Count Min Sketch (https://sites.google.com/site/countminsketch/)
    c) Count Min Sketch by Cormode : Introductory paper (http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf)
    d) Streaming Algorithms and Sketches - Count Min Sketch on AK Tech Blog (http://blog.aggregateknowledge.com/2011/09/13/streaming-algorithms-and-sketches/)
    e) Muthukrishnan talking on Count Min Sketch at AK Tech conference (http://www.youtube.com/watch?v=OOZC4KCErN0)
    f) Sketch Techniques for Approximate Query Processing by Cormode (http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf)
    g) Sketching data structures - a good overview of Bloom Filters and Count Min Sketch : (http://lkozma.net/blog/sketching-data-structures/)
    * An Improved Data Stream Summary: The Count-Min Sketch and its Applications - Cormode & Muthukrishnan : The paper that introduced count min sketch (http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf)
    * collection of information on Count Min Sketch (https://sites.google.com/site/countminsketch/)
    * Count Min Sketch by Cormode : Introductory paper (http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf)
    * Streaming Algorithms and Sketches - Count Min Sketch on AK Tech Blog (http://blog.aggregateknowledge.com/2011/09/13/streaming-algorithms-and-sketches/)
    * Muthukrishnan talking on Count Min Sketch at AK Tech conference (http://www.youtube.com/watch?v=OOZC4KCErN0)
    * Sketch Techniques for Approximate Query Processing by Cormode (http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf)
    * Sketching data structures - a good overview of Bloom Filters and Count Min Sketch : (http://lkozma.net/blog/sketching-data-structures/)

    8. Surveys
    a) References for Data Stream Algorithms by Graham Cormode : an exhaustive set of references with explanations (http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf)
    b) Data Streams - Algorithms and Applications by s. Muthukrishnan (http://www.amazon.com/Data-Streams-Applications-Foundations-Theoretical/dp/193301914X) : This is an excellent monograph with surveys of all algorithms related to data streams. Also a free copy of the book is available from Muthu's web site at http://www.cs.rutgers.edu/~muthu/
    c) Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine (http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004). Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.
    * References for Data Stream Algorithms by Graham Cormode : an exhaustive set of references with explanations (http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf)
    * Data Streams - Algorithms and Applications by s. Muthukrishnan (http://www.amazon.com/Data-Streams-Applications-Foundations-Theoretical/dp/193301914X) : This is an excellent monograph with surveys of all algorithms related to data streams. Also a free copy of the book is available from Muthu's web site at http://www.cs.rutgers.edu/~muthu/
    * Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine (http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004). Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.

    10. Distributed Streams Algorithms for Sliding Windows by Phillip B. Gibbons and Srikanta Tirthapura (http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf)
  26. @debasishg debasishg revised this gist Dec 31, 2013. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions gistfile1.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,9 @@
    1. General Background and Overview
    a) Probabilistic Data Structures for Web Analytics and Data Mining : On Highly Scalable Blog (http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    b) Models and Issues in Data Stream Systems : (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
    c) Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
    d) Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
    e) The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments (http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf)
    * Probabilistic Data Structures for Web Analytics and Data Mining : On Highly Scalable Blog (http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    * Models and Issues in Data Stream Systems : (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
    * Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
    * Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
    * The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments (http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf)

    2. Hyperloglog and MinHash : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)

  27. @debasishg debasishg renamed this gist Dec 31, 2013. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  28. @debasishg debasishg revised this gist Dec 30, 2013. 1 changed file with 4 additions and 3 deletions.
    7 changes: 4 additions & 3 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1,8 +1,9 @@
    1. General Background and Overview
    a) Probabilistic Data Structures for Web Analytics and Data Mining : On Highly Scalable Blog (http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    b) Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
    c) Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
    d) The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments (http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf)
    b) Models and Issues in Data Stream Systems : (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.9846)
    c) Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
    d) Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
    e) The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments (http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf)

    2. Hyperloglog and MinHash : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)

  29. @debasishg debasishg revised this gist Dec 30, 2013. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -22,11 +22,12 @@

    7. Count-Min Sketch
    a) An Improved Data Stream Summary: The Count-Min Sketch and its Applications - Cormode & Muthukrishnan : The paper that introduced count min sketch (http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf)
    b) faq on count-min sketch that also highlights its differences with bloom filters (https://sites.google.com/site/countminsketch/home/faq)
    b) collection of information on Count Min Sketch (https://sites.google.com/site/countminsketch/)
    c) Count Min Sketch by Cormode : Introductory paper (http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf)
    d) Streaming Algorithms and Sketches - Count Min Sketch on AK Tech Blog (http://blog.aggregateknowledge.com/2011/09/13/streaming-algorithms-and-sketches/)
    e) Muthukrishnan talking on Count Min Sketch at AK Tech conference (http://www.youtube.com/watch?v=OOZC4KCErN0)
    f) Sketch Techniques for Approximate Query Processing by Cormode (http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf)
    g) Sketching data structures - a good overview of Bloom Filters and Count Min Sketch : (http://lkozma.net/blog/sketching-data-structures/)

    8. Surveys
    a) References for Data Stream Algorithms by Graham Cormode : an exhaustive set of references with explanations (http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf)
  30. @debasishg debasishg revised this gist Dec 30, 2013. 1 changed file with 20 additions and 16 deletions.
    36 changes: 20 additions & 16 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1,17 +1,24 @@
    1. Hyperloglog and MinHash : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)
    1. General Background and Overview
    a) Probabilistic Data Structures for Web Analytics and Data Mining : On Highly Scalable Blog (http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/) : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
    b) Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.
    c) Methods for Finding Frequent Items in Data Streams by Graham Cormode & Marios Hadjieleftheriou (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&type=pdf)
    d) The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments (http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf)

    2. Streaming/Sketching Conference from AK Tech : Conatins links to videos and slides from the speakers like Muthukrishnan who spoke about Count Min Sketch (http://blog.aggregateknowledge.com/2013/05/23/foundation-capital-and-aggregate-knowledge-sponsor-streamingsketching-conference/)
    2. Hyperloglog and MinHash : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)

    3. Medians and Beyond: New Aggregation Techniques for Sensor Networks : The paper that introduced q-digest for range queries and quantile approximation (http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf)
    3. Streaming/Sketching Conference from AK Tech : Contains links to videos and slides from the speakers like Muthukrishnan who spoke about Count Min Sketch (http://blog.aggregateknowledge.com/2013/05/23/foundation-capital-and-aggregate-knowledge-sponsor-streamingsketching-conference/)

    4. Two blog posts on Q-Digest
    a) (http://papercruncher.com/2011/07/31/q-digest/)
    b) (http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/)
    c) The Art of Approximating Distributions: Histograms and Quantiles at Scale - an alternative approach to q-digest (http://metamarkets.com/2013/histograms/#)
    4. Q-digest
    a) Medians and Beyond: New Aggregation Techniques for Sensor Networks : The paper that introduced q-digest for range queries and quantile approximation (http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf)
    b) Blog post (http://papercruncher.com/2011/07/31/q-digest/)
    c) Blog post (http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/)
    d) The Art of Approximating Distributions: Histograms and Quantiles at Scale - an alternative approach to q-digest (http://metamarkets.com/2013/histograms/#)

    5. t-digest : A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. Ted Dunning's variant of Q-digest that does some improvements (https://github.com/tdunning/t-digest)

    6. stream-lib : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters (https://github.com/addthis/stream-lib)
    6. Implementations
    a) stream-lib : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters (https://github.com/addthis/stream-lib)
    b) Algebird from Twitter : (https://github.com/twitter/algebird)

    7. Count-Min Sketch
    a) An Improved Data Stream Summary: The Count-Min Sketch and its Applications - Cormode & Muthukrishnan : The paper that introduced count min sketch (http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf)
    @@ -21,12 +28,9 @@
    e) Muthukrishnan talking on Count Min Sketch at AK Tech conference (http://www.youtube.com/watch?v=OOZC4KCErN0)
    f) Sketch Techniques for Approximate Query Processing by Cormode (http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf)

    8. Data Streams - Algorithms and Applications by s. Muthukrishnan (http://www.amazon.com/Data-Streams-Applications-Foundations-Theoretical/dp/193301914X) : This is an excellent monograph with surveys of all algorithms related to data streams. Also a free copy of the book is available from Muthu's web site at http://www.cs.rutgers.edu/~muthu/
    8. Surveys
    a) References for Data Stream Algorithms by Graham Cormode : an exhaustive set of references with explanations (http://dimacs.rutgers.edu/~graham/pubs/papers/bristol.pdf)
    b) Data Streams - Algorithms and Applications by s. Muthukrishnan (http://www.amazon.com/Data-Streams-Applications-Foundations-Theoretical/dp/193301914X) : This is an excellent monograph with surveys of all algorithms related to data streams. Also a free copy of the book is available from Muthu's web site at http://www.cs.rutgers.edu/~muthu/
    c) Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine (http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004). Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.

    9. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches by Graham Cormode1, Minos Garofalakis, Peter J. Haas and Chris Jermaine (http://www.nowpublishers.com/articles/foundations-and-trends-in-databases/DBS-004). Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. It also discusses the trade-offs between the different synopsis types.

    10. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani (http://www.vldb.org/conf/2002/S10P03.pdf) : One of the early papers on the subject.

    11. The space complexity of approximating the frequency moments by Noga Alon, Yossi Matias, Mario Szegedy : one of the most influential papers introducing succinctness in computing frequency moments (http://www.tau.ac.il/~nogaa/PDFS/amsz4.pdf)

    12. Distributed Streams Algorithms for Sliding Windows by Phillip B. Gibbons and Srikanta Tirthapura (http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf)
    10. Distributed Streams Algorithms for Sliding Windows by Phillip B. Gibbons and Srikanta Tirthapura (http://home.engineering.iastate.edu/~snt/pubs/tocs04.pdf)