Collaborators:
- Andrei Valasiuk (avalasiuk3)
- Timothy Smith (tsmith484)
In this project, a Paxos protocol was implemented to provide a fault-tolerant KeyValue store. The KeyValue store inherited the behavior from Project 2 and provided At Most Once semantics. The Paxos servers take part in election of a distinguished proposer, that is a Leader, based on ballots to reduce a number of round trips and the contention over a Log slot. The communication among Paxos nodes take place by means of Proposal Requests and Responses (P2A and P2B) and Heartbeat messages probing Leader and Followers state. Paxos servers monitor the state of the key parties of the protocol and can make decisions about the view change amid failures and network partitions to ensure liveness. The implemented system guarantees the linearizability of the commands and the consistency of the data stored in the Paxos nodes.
When a server receives a Paxos Request there are the following scenarios:
- If a command from a Paxos Request has already been processed, server returns the response right away thereby enforcing AMO semantics which is handled by AMOApplication.
- If there is the only one server in the cluster, the server returns the response right away. There is no need in replication.
- If a Leader is not elected, the server initiates a Leader election process, sending out
P1Amessage to all the servers in the cluster in an effort to establish a distinguished proposer based on a server's ballot. - If the server is not a Leader, the request is discarded.
- If the server is a Leader, it put a request in the Command Log with the status
ACCEPTEDand sends outP2Amessage containing the Command and corresponging metadata to all theFollowers.
Leader election is initiated when a Server receives a Paxos Request.
Leader election happens as soon as a Server receives a P1A. The logic of the handlers is following:
- If a P1A recipient has a Leader, but ballot from the request is higher (
P1A.ballotNumber), then the leader for the recipient is preempted by theP1A.ballotNumber, heartbeat timer with the new leader is set, logical timestamp of the last notification from the sender is updated,P1Amessage is sent back to notify a leader about a new Follower. - If a P1A recipient does not have a Leader, but ballot from the request is higher (
P1A.ballotNumber), then the leader for the recipient is set as theP1A.ballotNumber, heartbeat timer with the new leader is set,P1Amessage is sent back to notify a leader about a new Follower. - If a
P1Arecipient has a ballot number greater than that of theP1Arequest, it adds the sender to the list of followers. If the number of followers is greater or equal to theservers.length / 2, it means that the current server has at least a half of the cluster as its Followers and can manifest itself as a Leader. It concludes the Leader election process and means that the cluster has a stable Leader. As soon as a server becomes a new Leader it does the following:- It sets a
HeartbeatTimerto probe state of the Followers - Since each server when receiving any of Paxos messages is given a set of accepted proposals from a sender, when
elected as a Leader, it can have a bunch of proposals to pick up and continue after the former Leader. As a result,
when a server is a new Leader, it merges all the accepted proposals by other servers and write them to the own Log
as
ACCEPTED. If the new Leader sees a message accepted by a majority, it marks the slot asCHOSEN, executes the command if applicable, and sends the response to the client right away. There might be a case, when the new Leader sees an accpeted proposal for a sloti, but there is no one for a sloti - 1. In this case the new Leader creates a proposal withKVStore.NoOperationcommand and promotes it to the Followers.
- It sets a
When a server receives a proposal (P2A message), it performs a bunch of the following sanity checks in the very beginning:
- No need to process a request if its slot number has already been chosen and applied.
- No need to processa a requests from non-leaders.
- if the P1A recipient does not have a leader or the sender's ballot number is greater than that of the current
Leader, send a
P1Ato the sender to actualize its followers and the Leader for the current server
- if the P1A recipient does not have a leader or the sender's ballot number is greater than that of the current
Leader, send a
If all the initial checks pass, the server update the logical timestamp the sender is seen. Also, it
takes the message from the sender (the Leader) and put them into the own Log if the respective
slot is empty. Once it is done, it sends a Proposal Reesponse (P2B message) to the sender.
When a server receives a proposal response (P2B message), it performs a bunch of the following sanity checks in the very beginning:
- No need to process a request if its slot number has already been chosen and applied.
- Make sure the message is discarded if the
P2Brecipient it not the Leader. - Make sure the slot from the message is presend in the Log, otherwise, discard the message.
- Make sure the command from the message matches the command from the same slot in the Log, otherwise, discard the message.
If all the initial checks pass, the server update the logical timestamp the sender is seen. Also, it adds the sender
to the set of the servers that accepted the slot. If the number of accepted servers is greater or equal to the half of
the servers in the cluster, marks the slot as CHOSEN, executes the command if applicable and sends the result
to the Client.
As soon as a Leder is elected and its Followers are aware about their Leder, the HeartBeatTimer is set to
probe the state of the Paxos participants. The HeartbeatTimer keeps being fired as long as the Leader is known for
the Followers and as long as the Leader is aware about available Followers that form a majority along with the Leader.
Every time the HeartbeatTimer goes off it checks the last visibility timestamp of:
- The Leader if the current server is the Follower. If the Leader has not been responding over 2
HeartBeatTimerinvocations, the Leader is reset tonullthat means that a round of the Leader election is required. - The Followers if the current server is the Leader. If the Followers have not been responding over 2
HeartBeatTimerinvocations to form a majority, the Leader is reset tonullthat means that a round of the Leader election is required. In addition, theBallotNumberis incremented. If both the Leader and Followers are available to each other, they sendHearBeatmessages to one another to inform about their accessibility.
Every time a server receives a HeartBeat message, it updates the last visibility timestamp of the sender.
The rest depends on the rolen in protocol.
- If the
HeartBeatrecipient is the Leader. It gets all the messages accepted by the sender (Follower) and repeat the operations mentioned in the Handling Proposal Responses paragraph - If the
HeartBeatrecipient is the Follower. It gets all the messages accepted by the sender (Leader) and repeat the operations mentioned in the Handling Proposals paragraph.
- Once a user calls a command from the Client, the Client packs a request containing
a command itself, a sequence number, and a current view number and sends it to all the servers. In addition,
the client sets a
ClientTimerto guarantee a retry mechanism in case of a timeout. TheClientTimerkeeps the request to repeat as an attribute. - Once the
ClientTimerfires, the client resends the request and resets the timer in case, the response from the server is empty and the sequence number of the request sent is the same as the last one. - Once the Client receives a response, it checks whether the response's sequence number corresponds to the sequence number the Client anticipates. If so, the Client accepts the response and sends a new request. If not, the Client discards the response.
Leader election is triggered by the Paxos requests to reduce the communication between the nodes when the servers stand idle.
When a Leader is elected, it takes all the messages accepted by other servers to replay. However, once the new Leader finds out that the message is accepted by more than a half, it executes the message and sends results to the Client.
When the Leader has a command C to execute, it goes through the other commands in the log leading up to the command C and makes sure they are applicable also. If so, the Leader apply preceding commands first, then the command C. If there are applicable commands following after the command C, the Leader applies them as well. It guarantees linearizable execution.
Each node keeps a set of ServerMetadata instances corresponding to the number
of the servers in the cluster. There are the attributes of ServerMetadata along with their purpose as follows:
lastSeen- increments for the holder of theHeartBeattimer every time it goes off. For the rest of the servers,lastSeenof the server A is updated every time the current server receives a message from the server A. It helps to estimate how long current server does not hear from the server A.myLeader- the server's Leader. For the Leader,myLeaderis self.accepted- the log of accepted messages from other servers. This log is applied as soon as a server becomes a Leader.followers- set of servers accepted the leadership of the current server
Each server keeps a Map<Address, Integer> of the most recent committed slot numbers. This structure is piggybacked with
each Paxos Server message and merged with the recipient's lastCommittedSlots. The idea behind it is that the minimal
last commited slot implies that all the servers in the cluster have executed commands in slots <= min(lastCommittedSlots)
which means that all the state before minimal last committed slot can be cleared to release the memory.
Once a Leader receives a Paxos request it does not immediately create a new entry in the Log. Instead, it looks up the command in the Log and proceed with it.
Every message sent between two servers contains:
- A set of decisions made by the servers. This is done to accelerate the delivery of a decision to each server in a group.
- Last committed slot numbers of each server. This approach improve the transparence among the servers in the cluster about what commands have been applied to each server. It gives each server a clarity about which slots in the Log are no longer necessary to keep and let each server run a clean-up operation that drops irrelevant slots in the Log. The clean-up is performed in the very beginning of each Paxos Server's handler.
- Accepted slots of each server. This helps the servers to track accepted slots to be able to apply them in case when a server becomes a new Leader. The awareness of servers' accpeted slots facilitates a recovery mechanism.
To reduce the number of disseminated messages and decrease the response time, Followers accept messages within
P2A processor as well as the Leader choses commands within P2B processor in bulk. Both the Followers and the Leader
loop trhough the messages the sender has already accepted and make a decision whether the message can be chosen
(if it is the Leader) or accepted (if it is the Follower).
The same operation takes place in the HeartBeat handler.
All the Client's logic is inherited from Project 1.
- Paxos Made Moderately Complex
- Paxos Made Moderately Complex
- Paxos Made Simple
- DSLabs. Lab 3: Paxos
- Discussion Section Slides from DSLabs
- Some paragraphs were taken from the report of Project 2 because the corresponding pieces of logic were not changed.
- Github Copilot as an assistant in writing the report.