Quick and reliable propagation.
More applicable to cluster wide metadata: small, infrequent, must be propagated reliably.
Types:
- Broadcast: one to rest.
- Periodically, exchange messages peer-to-peer.
- Co-operative broadcast: node receives messages, spreads messages quickly.
Read-repair, hinted handoff, Merkle trees: see Dynamo paper.
Hinted-handoff vs. sloppy quorums: Voldemort and Cassandra do not implement sloppy quorums (where hints can be read immediately), use HH to catch up nodes that went off-line. AF: Note that these are used for data as opposed to metadata.
Digest reads: request a digest, sync if digest mismatches.
Bitmap Version Vectors: using bitmaps to record encode casual relations, allow identifying missing points.
From Amazon Aurora Ascendant:
The basic idea of a write quorum implies that some segments may not initially receive all of the writes, all of the time. How do those segments deal with gaps in the redo log stream? Aurora storage nodes continuously "gossip" among themselves to fill holes (and perform repairs). Log stream advancement is tightly orchestrated through Log Sequence Number (LSN) management. We use a set of LSN markers to maintain the state of each individual segment.
Dynamo-type systems use version vectors (which they slightly misleadingly call vector clocks) by only having N nodes in a clock (where N is the quorum size) and garbage collecting old entries.
Implementations:
Voldemort Standalone OCaml Riak
Question: does using bitmaps address scale issues (e.g., number of nodes, number of versions)?
Similar to spread of rumors or epidemics. Select f
(fan-out) peers
at random, exchange information. Distribute a message in
Implementation:
Overlay networks: exact certainty, fixed topologies (spanning trees.)
Plumtree: combine epidemic and tree-based approach, only sends full message to a subset of peers, lazily forwards message id to others.
Used by Riak core
Deals with the problem of churn (number of nodes leaving and joining.) Active: participate in dissemination, passive: used to replace the active ones if they fail.
HyParView: Hybrid Partial Views.