Skip to content

Instantly share code, notes, and snippets.

@mmaia
Last active January 10, 2018 07:35
Show Gist options
  • Save mmaia/75ae5f11f5df3a974d6c05b52997ec57 to your computer and use it in GitHub Desktop.
Save mmaia/75ae5f11f5df3a974d6c05b52997ec57 to your computer and use it in GitHub Desktop.
Notes about consensus algorithms, consensus implementations and approaches

Consensus

A fundamental problem and one of the most important abstractions for distributed and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes, consensus. This often requires processes to agree on some data value that is needed during computation.

Consensus algorithm requires at least a mojority of nodes to be functioning correctly so that majority can for a quorum. There's a termination property in consensus that is subject to the assumption that fewer than half of the nodes are crashed or unreachable, however, most implementations of consensus ensure that the safety properties - agreement, integrity and validity - are always met, even if the majority of nodes fail or there is a severe network problem, meaning that in case of a large scale outage can stop teh system from being able to process requests, but it cannot corrupt the consensus system by causing it to make invalid decisions.

Putting the aforementioned paragraph in example it means we need a minimum of three nodes in order to tolerate one failure, a minimum of 5 nodes to tolerate two failures and so on. In practice means if a network failure cuts off some nodes from the rest only the majority portion is capable of continue, the rest is blocked(avoids split-brain). Most consensus algorithms assume a set of fixed set of nodes.

A consensus protocol tolerating halting failures must satisfy the following properties:

Termination: Every correct process decides some value.

Validity: If all processes propose the same value v, then all correct processes decide v.

Integrity: Every correct process decides at most one value, and if it decides some value v, then v must have been proposed by some process.

Agreement: Every correct process must agree on the same value.

Known algorithms implementations (most used)

The best known fault-tolerant consensus algorithms are Viewstamped Replication(VSR), Paxos, Raft and Zab.

  • etcd (Raft) - Used by Kubernetes

  • Aomix(Raft) - Implementation used by Atomix library, requires us to implement this for each application, not recommended if we're running too many nodes.

  • Zookeeper(Zab) - This is the implementation used by Zookeeper, we could use spring cloud support for it if we go for this, the drawback is that we need an available zookeeper cluster.

  • Chubby (Paxos) - Google uses the Paxos algorithm in their Chubby distributed lock service and in their spanner database, but due to it's complexity and variations it's not really the initial recommended approach.

Interesting references

Paxos vs Zab

https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos

Raft

This page has a "live" demonstration of how Raft works, it's very usefull for illustration and to clarify:

https://raft.github.io/

etcd - coreos service that implements Raft and it's the default for Kubernetes.

Wikipedia Paxos Apache Zookeeper Zab Raft

Spring support

At this date (January, 9th, 2018) Spring cloud offers some support for distributted locking but not for leader election and it looks like it also requires some effort and it's not 100% reliable, see this tickets for instance:

spring-cloud/spring-cloud-commons#173 spring-cloud/spring-cloud-zookeeper#93

Final considerations

After carefully studying this for a while it looks like the best approach is to use Zookeeper with Apache Curator specially if one of existing curator recipes satisfy our needs.

Other possible options that make sense to a lot of systems is to use a Database lock, which requires polling but it just works, in this case the DB might be a single point of failure but in many cases this is acceptable.

there's also an option to implement a light weight consensus engine using Atomix library that provides a strong Raft implementation but this would require more effort and further testing to validate the solution.

There's always the option of manually failover using configuration(requires human intervention).

Finally there's the possibility of relying on etcd but this would require also a lot of effor like the previous choice.

Other options

I didn't go too deep investigating how we could rely on Consul with Nomad to achieve what we need, those are also possible solutions but due the fact Zookeeper it's more well known at the moment it makes sense to focus on it and also it looks like to achieve what we need we would need a combination of both Consul and Nomad(relies on consul for some features).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment