Created
July 30, 2010 03:17
-
-
Save abecciu/499825 to your computer and use it in GitHub Desktop.
Dynamic addition of candidate nodes in gen_leader
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Algorithm | |
---------- | |
Let's say we have a cluster of nodes A < B < C where A is the leader, and B and C are | |
candidates. We want to add a new candidate D to the cluster, and we select node B as | |
the seed node. | |
When node D is started, the following happens: | |
(1) D sends a 'join' msg to B, and D starts monitoring B | |
(2) B replies with a 'hasLeader, A' message | |
(3) D starts monitoring A, and D sends a 'isLeader' message to A | |
(4) A detects that D is a new node and adds it to the candidates list assigning it the lowest priority | |
(5) A sends 'update_candidates,Candidates' message to all the candidates that are alive except D | |
(6) A sends a 'ldr' message to D | |
(7) D accepts A as the leader, updates its candidates list obtained from the 'ldr' message and | |
starts monitoring all the candidates with higher priority. | |
When the process finishes the candidates list will be: A < B < C < D | |
Some Failure Scenarios | |
----------------------- | |
I) B crashes in (1) or (2) | |
In this case, node D would receive a DOWN message and crash. | |
II) A crashes in (3) | |
In this case, node D would check if node B is still alive. If it is, the joining procedure is | |
restarted. Node B will handle the 'join' message from D when the election procedure is completed. | |
If node B is down, then node D will crash. | |
III) A crashes in (5) right after sending the 'update_candidates' message to B. | |
For node D this case is exactly like II) above. | |
B and C will have disparate candidate lists when the election procedure begins. B will have | |
"A < B < C < D", and C will have "A < B < C". | |
That won't be a problem because D has the lowest priority, so there's no way that it could be | |
elected. | |
When the election procedure starts, D will receive a 'halt' message from B. D will take the | |
normal action, set its status to wait and reply with an 'ack' message to B. Later, it will | |
receive a 'ldr' message from B, accept B as leader, update the candidates list and set its | |
status to norm. C will also receive a 'ldr' message and update its candidates lists. | |
If B dies in the middle of the election procedure, D will die too. | |
IV) A dies right after finishing (5) and before starting (6) | |
Exactly the same thing as III) above. | |
V) D crashes somewhere between (5) and (6) | |
The rest of the candidates will have D on the list so they will treat it like a normal | |
candidate. When D starts again, the joining procedure will be performed normally except that | |
step (5) will be avoided because all the candidates are already aware of D. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hello,
did you implement this in the gen_leader module ?
thanks