Algorithm ---------- Let's say we have a cluster of nodes A < B < C where A is the leader, and B and C are candidates. We want to add a new candidate D to the cluster, and we select node B as the seed node. When node D is started, the following happens: (1) D sends a 'join' msg to B, and D starts monitoring B (2) B replies with a 'hasLeader, A' message (3) D starts monitoring A, and D sends a 'isLeader' message to A (4) A detects that D is a new node and adds it to the candidates list assigning it the lowest priority (5) A sends 'update_candidates,Candidates' message to all the candidates that are alive except D (6) A sends a 'ldr' message to D (7) D accepts A as the leader, updates its candidates list obtained from the 'ldr' message and starts monitoring all the candidates with higher priority. When the process finishes the candidates list will be: A < B < C < D Some Failure Scenarios ----------------------- I) B crashes in (1) or (2) In this case, node D would receive a DOWN message and crash. II) A crashes in (3) In this case, node D would check if node B is still alive. If it is, the joining procedure is restarted. Node B will handle the 'join' message from D when the election procedure is completed. If node B is down, then node D will crash. III) A crashes in (5) right after sending the 'update_candidates' message to B. For node D this case is exactly like II) above. B and C will have disparate candidate lists when the election procedure begins. B will have "A < B < C < D", and C will have "A < B < C". That won't be a problem because D has the lowest priority, so there's no way that it could be elected. When the election procedure starts, D will receive a 'halt' message from B. D will take the normal action, set its status to wait and reply with an 'ack' message to B. Later, it will receive a 'ldr' message from B, accept B as leader, update the candidates list and set its status to norm. C will also receive a 'ldr' message and update its candidates lists. If B dies in the middle of the election procedure, D will die too. IV) A dies right after finishing (5) and before starting (6) Exactly the same thing as III) above. V) D crashes somewhere between (5) and (6) The rest of the candidates will have D on the list so they will treat it like a normal candidate. When D starts again, the joining procedure will be performed normally except that step (5) will be avoided because all the candidates are already aware of D.