- Differential Dataflow: https://github.com/frankmcsherry/timely-dataflow - The code is ugly Rust, but the logic and linked papers are quite interesting.
- Spinning Fast Iterative Dataflows: http://vldb.org/pvldb/vol5/p1268_stephanewen_vldb2012.pdf - Flink's execution model. See also http://us9.campaign-archive2.com/?u=4188b6afbe9e5d43111fef4d4&id=46ab8c2adf&e=04a33a53a5
- Discretized Streams: http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf - Spark Streaming's model of operation.
- Google's Dataflow Model: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf - This is now also available as Apache (Incubating) Beam https://wiki.apache.org/incubator/BeamProposal
- Algorithms + Data Structures: https://gist.github.com/debasishg/8172796
- Streaming 101: https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
- Streaming 102: https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
- Out of the Fire Swamp:
- Part I, The Data Crisis: http://us9.campaign-archive1.com/?u=4188b6afbe9e5d43111fef4d4&id=2a4e194559&e=04a33a53a5
- Part II, Peering into the Mist: http://us9.campaign-archive2.com/?u=4188b6afbe9e5d43111fef4d4&id=afa607ad49&e=04a33a53a5
- Part III, Go with the Flow: http://us9.campaign-archive2.com/?u=4188b6afbe9e5d43111fef4d4&id=83920fcde0&e=04a33a53a5
- Dancing Calmly with the Devil: http://db.cs.berkeley.edu/jmh/talks/SoCC14-keynote.pdf
- Consistency without Borders: http://people.ucsc.edu/~palvaro/a23-alvaro.pdf
- When Worst is Best in Distributed System Design: https://speakerdeck.com/pbailis/when-worst-is-best-in-distributed-systems-design
- RDDs: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf (interesting for lineages, etc)
- Redbook 5th Edition: http://www.redbook.io/
- Progressive Systems Seminar: https://sites.google.com/site/progressive294/home/readings