Topics and questions for discussion with João and Jordan
topics
- peer sampling service vs. gossip service
- how does this map to cassandra? riak?
- where does a failure detection system fall within a gossip system?
random questions
- when João is thinking about 10k node systems, what does he imagine the puspose of that cluster to be and why using gossip? a P2P network, like bit torrent?
- I've noticed gossip papers tend to paper over the state of a peer, it's either UP or DOWN. There are several more states we have to deal with as practitioners, in cases such as removing a a node from a cluster and when it can be safely expunged by all other peers.
- in c*, we leave the state of a decom'ed node as LEFT for three days, with an explicit wall clock tmeout that all peers should obey. Then we quarantine the decom'ed node for ~30 seconds, then all peers should completely forget about. The quarantine is very short to account for a node that restart within the quarantine time does not remember the gone node after the timeout.
- peersim. how useful is it? initial look at the code (which seems rarely updated), looks a little rusty.
João's papers:
- HyParView
- Plumtree
- Thicket