Skip to content

Instantly share code, notes, and snippets.

@camjo
Created August 25, 2018 03:45
Show Gist options
  • Save camjo/75ca375d6569b790024016b75d3f6e90 to your computer and use it in GitHub Desktop.
Save camjo/75ca375d6569b790024016b75d3f6e90 to your computer and use it in GitHub Desktop.
Another example API (less complete) based on the beam model where everything is streams and tables
// Following model based on the description of
// Apache Beam in the book Streaming Systems by Reuven Lax; Tyler Akidau; Slava Chernyak
/**
* Tables are data at rest, and act as a container for data to accumulate and be observed over time.
* K = Key
* V = Value
* W = Window
* P = Partition
*/
type Table[K, V, W, P]
/**
* Streams are data in motion, and encode a discretized view of the evolution of a table over time.
* K = Key
* V = Value
* W = Window
* P = Partition
*/
type Stream[K, V, W, P]
/**
* Applying nongrouping operations to a stream alters the data in the stream while leaving them in motion,
* yielding a new stream with possibly different cardinality.
*
* Stream -> Stream
*/
trait NonGroupingOperations {
def map[K, V1, V2, W, P](stream: Stream[K, V1, W, P])(f: V1 => V2): Stream[K, V2, W, P]
def filter[K, V, W, P](stream: Stream[K, V, W, P])(f: V => Boolean): Stream[K, V, W, P]
def assignWindow[K, V, W1, W2, P](stream: Stream[K, V, W1, P])(window: W2): Stream[K, V, W2, P]
}
/**
* Grouping data within a stream brings those data to rest, yielding a table that evolves over time.
*
* - Windowing incorporates the dimension of event time into such groupings.
*
* - Merging windows dynamically combine over time, allowing them to reshape themselves in response to the data
* observed and dictating that key remain the unit of atomicity/parallelization, with window being a child
* component of grouping within that key.
*
* Stream -> Table
*/
trait GroupingOperations {
}
/**
* Triggering data within a table ungroups them into motion, yielding a stream that captures a view of the table’s
* evolution over time.
*
* - Watermarks provide a notion of input completeness relative to event time, which is a useful reference point
* when triggering event-timestamped data, particularly data grouped into event-time windows from unbounded
* streams.
*
* - The accumulation mode for the trigger determines the nature of the stream, dictating whether it contains
* deltas or values, and whether retractions for previous deltas/values are provided.
*
* Table -> Stream
*/
trait UnGroupingOperations {
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment