A fundamental problem and one of the most important abstractions for distributed and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes, consensus. This often requires processes to agree on some data value that is needed during computation.
Consensus algorithm requires at least a mojority of nodes to be functioning correctly so that majority can for a quorum. There's a termination property in consensus that is subject to the assumption that fewer than half of the nodes are crashed or unreachable, however, most implementations of consensus ensure that the safety properties - agreement, integrity and validity - are always met, even if the majority of nodes fail or there is a severe network problem, meaning that in case of a large scale outage can stop teh system from being able to process requests, but it cannot corrupt the consensus system by causing it to make invalid decisions.
Putting the aforementioned paragraph in example it means we need a minimum of three nodes in order to tolerate one failure, a minimum of 5 nodes to tolerate two failures and so on. In practice means if a network failure cuts off some nodes from the rest only the majority portion is capable of continue, the rest is blocked(avoids split-brain). Most consensus algorithms assume a set of fixed set of nodes.
Termination: Every correct process decides some value.
Validity: If all processes propose the same value v, then all correct processes decide v.
Integrity: Every correct process decides at most one value, and if it decides some value v, then v must have been proposed by some process.
Agreement: Every correct process must agree on the same value.
The best known fault-tolerant consensus algorithms are Viewstamped Replication(VSR), Paxos, Raft and Zab.
-
etcd (Raft) - Used by Kubernetes
-
Aomix(Raft) - Implementation used by Atomix library, requires us to implement this for each application, not recommended if we're running too many nodes.
-
Zookeeper(Zab) - This is the implementation used by Zookeeper, we could use spring cloud support for it if we go for this, the drawback is that we need an available zookeeper cluster.
-
Chubby (Paxos) - Google uses the Paxos algorithm in their Chubby distributed lock service and in their spanner database, but due to it's complexity and variations it's not really the initial recommended approach.
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos
This page has a "live" demonstration of how Raft works, it's very usefull for illustration and to clarify:
etcd - coreos service that implements Raft and it's the default for Kubernetes.
Wikipedia Paxos Apache Zookeeper Zab Raft
At this date (January, 9th, 2018) Spring cloud offers some support for distributted locking but not for leader election and it looks like it also requires some effort and it's not 100% reliable, see this tickets for instance:
spring-cloud/spring-cloud-commons#173 spring-cloud/spring-cloud-zookeeper#93
After carefully studying this for a while it looks like the best approach is to use Zookeeper with Apache Curator specially if one of existing curator recipes satisfy our needs.
Other possible options that make sense to a lot of systems is to use a Database lock, which requires polling but it just works, in this case the DB might be a single point of failure but in many cases this is acceptable.
there's also an option to implement a light weight consensus engine using Atomix library that provides a strong Raft implementation but this would require more effort and further testing to validate the solution.
There's always the option of manually failover using configuration(requires human intervention).
Finally there's the possibility of relying on etcd but this would require also a lot of effor like the previous choice.
I didn't go too deep investigating how we could rely on Consul with Nomad to achieve what we need, those are also possible solutions but due the fact Zookeeper it's more well known at the moment it makes sense to focus on it and also it looks like to achieve what we need we would need a combination of both Consul and Nomad(relies on consul for some features).