Collaborators:
- Andrei Valasiuk (avalasiuk3)
- Timothy Smith (tsmith484)
In this project, a Paxos protocol was implemented to provide a fault-tolerant KeyValue store. The KeyValue store inherited the behavior from Project 2 and provided At Most Once semantics. The Paxos servers take part in the election of a distinguished proposer, a Leader, based on ballots to reduce the number of round trips and the contention over a Log slot. The communication among Paxos nodes takes place using Proposal Requests and Responses (P2A and P2B) and Heartbeat messages probing the Leader and Followers state. Paxos servers monitor the state of the key parties of the protocol and can make decisions about the view change amid failures and network partitions to ensure liveness. The implemented system guarantees the linearizability of the commands and the consistency of the data stored in the Paxos nodes.
When a server receives a Paxos Request there are the following scenarios:
- If a command from a Paxos Request has already been processed, the server returns the response right away thereby enforcing AMO semantics which is handled by AMOApplication.
- If there is only one server in the cluster, the server returns the response right away. There is no need for replication.
- If a Leader is not elected, the server initiates a Leader election process, sending out a
P1Amessage to all the servers in the cluster to establish a distinguished proposer based on a server's ballot. - If the server is not a Leader, the request is discarded.
- If the server is a Leader, it puts a request in the Command Log with the status
ACCEPTEDand sends outP2Amessage containing the Command and corresponding metadata to all theFollowers.
Leader election is initiated when a Server receives a Paxos Request and happens as soon as a Server receives a P1A message. The logic of the handlers is the following:
- If a P1A recipient has a Leader, but the ballot from the request is higher (
P1A.ballotNumber), then the leader for the recipient is preempted by theP1A.ballotNumber, heartbeat timer with the new leader is set, logical timestamp of the last notification from the sender is updated,P1Amessage is sent back to notify a leader about a new Follower. - If a P1A recipient does not have a Leader, but the ballot from the request is higher (
P1A.ballotNumber), then the leader for the recipient is set as theP1A.ballotNumber, heartbeat timer with the new leader is set,P1Amessage is sent back to notify a leader about a new Follower. - If a
P1Arecipient has a ballot number greater than that of theP1Arequest, it adds the sender to the list of followers. If the number of followers is greater or equal to theservers.length / 2, it means that the current server has at least half of the cluster as its Followers and can manifest itself as a Leader. It concludes the Leader election process and means that the cluster has a stable Leader. As soon as a server becomes a new Leader it does the following:- It sets a
HeartbeatTimerto probe the state of the Followers - Since each server when receiving any of Paxos messages is given a set of accepted proposals from a sender, when
elected as a Leader, it can have a bunch of proposals to pick up and continue after the former Leader. As a result,
when a server is a new Leader, it merges all the accepted proposals by other servers and writes them to its Log
as
ACCEPTED. If the new Leader sees a message accepted by a majority, it marks the slot asCHOSEN, executes the command if applicable, and sends the response to the client right away. There might be a case when the new Leader sees an accepted proposal for a sloti, but there is no one for a sloti - 1. In this case, the new Leader creates a proposal withKVStore.NoOperationcommand and promotes it to the Followers.
- It sets a
When a server receives a proposal (P2A message), it performs a bunch of the following sanity checks in the very beginning:
- No need to process a request if its slot number has already been chosen and applied.
- No need to process requests from non-leaders.
- if the P1A recipient does not have a leader or the sender's ballot number is greater than that of the current Leader, send a
P1Ato the sender to actualize its followers and the Leader for the current server.
- if the P1A recipient does not have a leader or the sender's ballot number is greater than that of the current Leader, send a
If all the initial checks pass, the server updates the logical timestamp the sender is seen. Also, it
takes the message from the sender (the Leader) and puts it into its Log if the respective
slot is empty. Once it is done, it sends a Proposal Response (P2B message) to the sender.
When a server receives a proposal response (P2B message), it performs a bunch of the following sanity checks in the very beginning:
- No need to process a request if its slot number has already been chosen and applied.
- Make sure the message is discarded if the
P2Brecipient is not the Leader. - Make sure the slot from the message is present in the Log, otherwise, discard the message.
- Make sure the command from the message matches the command from the same slot in the Log, otherwise, discard the message.
If all the initial checks pass, the server updates the logical timestamp the sender is seen. Also, it adds the sender
to the set of servers that accept the slot. If the number of accepted servers is greater or equal to half of
the servers in the cluster, mark the slot as CHOSEN, execute the command if applicable, and send the result
to the Client.
As soon as a Leder is elected and its Followers are aware of their Leader, the HeartBeatTimer is set to
probe the state of the Paxos participants. The HeartbeatTimer keeps being fired as long as the Leader is known for
the Followers and as long as the Leader is aware of available Followers that form a majority along with the Leader.
Every time the HeartbeatTimer goes off it checks the last visibility timestamp of:
- The Leader if the current server is the Follower. If the Leader has not been responding over 2
HeartBeatTimerinvocations, the Leader is reset tonullwhich means that a round of the Leader election is required. - The Followers if the current server is the Leader. If the Followers have not been responding over 2
HeartBeatTimerinvocations to form a majority, the Leader is reset tonullwhich means that a round of the Leader election is required. In addition, theBallotNumberis incremented. If both the Leader and Followers are available to each other, they sendHearBeatmessages to one another to inform about their accessibility.
Every time a server receives a HeartBeat message, it updates the last visibility timestamp of the sender.
The rest depends on the role of the protocol.
- If the
HeartBeatrecipient is the Leader. It gets all the messages accepted by the sender (Follower) and repeats the operations mentioned in the Handling Proposal Responses paragraph - If the
HeartBeatrecipient is the Follower. It gets all the messages accepted by the sender (Leader) and repeats the operations mentioned in the Handling Proposals paragraph.
- Once a user calls a command from the Client, the Client packs a request containing
a command itself, a sequence number, and a current view number and sends it to all the servers. In addition,
the client sets a
ClientTimerto guarantee a retry mechanism in case of a timeout. TheClientTimerkeeps the request to repeat as an attribute. - Once the
ClientTimerfires, the client resends the request and resets the timer in case, the response from the server is empty and the sequence number of the request sent is the same as the last one. - Once the Client receives a response, it checks whether the response's sequence number corresponds to the sequence number the Client anticipates. If so, the Client accepts the response and sends a new request. If not, the Client discards the response.
Leader election is triggered by the Paxos requests to reduce the communication between the nodes when the servers stand idle.
When a Leader is elected, it takes all the messages accepted by other servers to replay. However, once the new Leader finds out that the message is accepted by more than half, it executes the message and sends the results to the Client.
When the Leader has a command C to execute, it goes through the other commands in the log leading up to the command C and makes sure they are applicable also. If so, the Leader applies the preceding commands first, then the command C. If there are applicable commands following after the command C, the Leader applies them as well. It guarantees linearizable execution.
Each node keeps a set of ServerMetadata instances corresponding to the number
of servers in the cluster. There are the attributes of ServerMetadata along with their purpose as follows:
lastSeen- increments for the holder of theHeartBeattimer every time it goes off. For the rest of the servers,lastSeenof the server A is updated every time the current server receives a message from the server A. It helps to estimate how long the current server does not hear from server A.myLeader- the server's Leader. For the Leader,myLeaderis self.accepted- the log of accepted messages from other servers. This log is applied as soon as a server becomes a Leader.followers- a set of servers that accepted the leadership of the current server.
Each server keeps a Map<Address, Integer> of the most recent committed slot numbers. This structure is piggybacked with
each Paxos Server message and merged with the recipient's lastCommittedSlots. The idea behind it is that the minimal
last committed slot implies that all the servers in the cluster have executed commands in slots <= min(lastCommittedSlots)
which means that all the states before the minimal last committed slot can be cleared to release the memory.
Once a Leader receives a Paxos request it does not immediately create a new entry in the Log. Instead, it looks up the command in the Log and proceed with it.
Every message sent between two servers contains:
- A set of decisions made by the servers. This is done to accelerate the delivery of a decision to each server in a group.
- Last committed slot numbers of each server. This approach improves the transparency among the servers in the cluster about what commands have been applied to each server. It gives each server clarity about which slots in the Log are no longer necessary to keep and lets each server run a clean-up operation that drops irrelevant slots in the Log. The clean-up is performed at the very beginning of each Paxos Server's handler.
- Accepted slots of each server. This helps the servers track accepted slots to be able to apply them in case when a server becomes a new Leader. The awareness of servers' accepted slots facilitates a recovery mechanism.
To reduce the number of disseminated messages and decrease the response time, Followers accept messages within
P2A processor as well as the Leader chooses commands within the P2B processor in bulk. Both the Followers and the Leader
loop through the messages the sender has already accepted and decide whether the message can be chosen
(if it is the Leader) or accepted (if it is the Follower).
The same operation takes place in the HeartBeat handler.
All the Client's logic is inherited from Project 1.
- Paxos Made Moderately Complex
- Paxos Made Moderately Complex
- Paxos Made Simple
- DSLabs. Lab 3: Paxos
- Discussion Section Slides from DSLabs
- Some paragraphs were taken from the report of Project 2 because the corresponding pieces of logic were not changed.
- Github Copilot as an assistant in writing the report.