- Differential Dataflow - The code is ugly Rust, but the logic and linked papers are quite interesting.
- Spinning Fast Iterative Dataflows - Flink's execution model. Also, coverage in the Morning Paper.
- Discretized Streams - Spark Streaming's model of operation.
- Google's Dataflow Model - This is now also available as Apache (Incubating) Beam.
- Kafka Streams - Kafka offers "hipster stream processing," and a nice unification between tables and streams.
- Out of the Fire Swamp:
- Dancing Calmly with the Devil
- Consistency without Borders
- When Worst is Best in Distributed System Design
- RDDs (interesting for lineages, etc)
- Redbook 5th Edition
- Progressive Systems Seminar
- Ground - Open source system for metadata version + lineage management. New slide deck.
- Confluent Schema Registry - Kafka-centric avro schema + metadata management.
- Arrow - Powering Columnar In-Memory Analytics.
- Parquet - Columnar storage format available to any project in the Hadoop ecosystem.
- Alluxio - FKA Tachyon. Memory-centric virtual distributed storage system.
- Filo - Fast, memory-efficient, minimal-serialization, binary data vectors.
- Feather - Fast, interoperable binary data frame storage for Python, R, etc powered by Apache Arrow.