In RAFT algorithm, is it possible for a follower to become leader if it got disconnected from the current leader?

418 Views Asked by At

Consider below scenario in RAFT algorithm.

  1. A leader is currently available in the cluster.
  2. The log is up to date in all nodes.
  3. One follower gets disconnected from the leader, while all other followers are still connected to the leader and are receiving heartbeats.
  4. The follower who got disconnected becomes a candidate and starts a vote for a new term, since it didn't receive heartbeats from leader.

Would the other followers vote for the new candidate and elect it as the leader, while the previous leader is also still healthy?

2

There are 2 best solutions below

1
AndrewR On BEST ANSWER

"Would the other followers vote for the new candidate and elect it as the leader, while the previous leader is also still healthy?" the answer is yes - voting does not depend if existing leader is healthy or not.

According to raft white paper (https://raft.github.io/raft.pdf page 4), the follower without hearing a heart beat will become a candidate - it will increase the term and will request votes. And other nodes will have to vote yes as all conditions are met. In fact, even the leader will vote yes and stop being a leader in case they receive such voting request.

This is the sequence of events:

  • given the cluster is stable and term number is 10; three nodes A,B,C,D,E and A is the leader
  • B stops getting heartbeats from A
  • B will become the candidate at some point
  • B will increase its term to 11
  • B will initiate voting
  • all node will vote yes, because a) the term is larger then previous they saw and b) they did not yet vote in term 11 and c) the log of B is as updated as theirs => vote yes
  • when the leader sees a message with the term larger than theirs, they turn to be a follower and vote yes as well

Raft, and many other similar protocols, do not have a concept of strong leader - which means nothing stops the cluster to swap the leader even if the leader is healthy.

If raft protocol is implemented as the paper describes, this is a common issue that a disconnected but alive follower keep initiating new elections on every reconnect. This is happens as the disconnected but alive follower keeps becoming the candidate, increasing it terms - as they can't win the election while disconnected. So when they actually reconnect, suddenly their term is larger then the current one of the cluster, hence new election happens (they won't win it as they don't have latest logs).

In practice, it is common to have "pre candidate" check - the candidate checks if it has connectivity to majority of nodes before bumping up its term number. This approach prevents unnecessary election when the network is not stable.

3
Navitas28 On

No, because the incumbent leader is still in good health, the other followers would not choose the new contender and elect it as the leader. This is so that the other followers can continue to feel the prior leader's heartbeats and know that he or she is still in charge. The only follower who is unaware that the prior leader is still in charge is the one who lost connection.

There is only one leader at a time thanks to the Raft algorithm. Every node in the cluster receives a RequestVote message when a follower qualifies as a candidate. The contender with the most terms receives the support of each node. If the candidate wins the most votes, they are declared the winner.

In the scenario you described, the disconnected follower will send a RequestVote message to all other nodes in the cluster. However, the other nodes will not vote for the disconnected follower because they know that the previous leader is still the leader. The disconnected follower will not receive a majority of votes, and it will not become the new leader.

The Raft algorithm is designed to ensure that the cluster always has a leader. This is important because the leader is responsible for coordinating the activities of the cluster. Without a leader, the cluster would be unable to function properly.

Reference: Raft: A Consensus Algorithm for Replicated State Machines

You can check Section 3.4, 4.2 and 5.1 of the linked paper for further understanding of leader election.

Further in addition to the vote timeout for leader heartbeats, followers have another timeout. This timeout is called the heartbeat timeout. The heartbeat timeout is the amount of time a follower will wait for a heartbeat from the leader before considering the leader dead. If the leader does not send a heartbeat within the heartbeat timeout, the follower becomes a candidate and starts a new election.

There is one election timeout also, it is the amount of time a follower waits for a response to its vote request before it considers the election failed. If the follower does not receive a response to their vote request within the election timeout, they will start a new election.

The heartbeat timeout is usually shorter than the election timeout. This is because the heartbeat timeout is used to detect a leader who is temporarily unavailable, while the election timeout is used to detect a leader who is permanently dead.

By having two different timeouts, the Raft algorithm can ensure that the cluster always has a leader, even if the leader is temporarily unavailable.

Section 5.2, 5.3, 5.4 have the necessary information about the timeouts.