What major problems would I have if I use Raft Consensus Algorithm in 50+ pods?

148 Views Asked by zgli20 At 25 December 2023 at 01:57

I've been researching Raft Consensus Algorithm for a problem that I'm trying to solve, and I've decided that Raft is indeed the way to go.

However, due to the nature of the environment I'm working on, there are at least 20 pods at one time and with autoscale it goes all the way up to 100+. For simplicity let's say 50+ pods.

So, since the leader selection mechanism is quite "flexible", having more pods than the examples on the internet (3 - 5 - 7 etc) naturally makes me think doing this on a 20-50-100 pods environment is tricky/risky.

I could not find any information regarding the risks of using Raft on such scale though. I know that etcd and K8S uses Raft internally, so maybe it won't be as much of an issue as I'm thinking. So my question is, is there any potential major drawbacks to using Raft in a high scale environment with many pods?

For side-information, I'll be using this on a K8S environment with NodeJS microservices. Depending on the availability of service discovery, I'll either use simple TCP connections or use an intermediary Redis pub-sub for the communication.

Thanks!

Original Q&A

There are 1 best solutions below

AndrewR On 26 December 2023 at 22:36

I don't think Raft is going to be efficient with these two use cases:

with many nodes coming/leaving, it will be lots of noise to reconfigure the cluster. This is due to progression requires majority of servers being available. E.g. if there are 100 nodes, then 50+1 are required to commit next iteration; even if this iteration is a reconfig (as described in raft paper)
I am not sure why anyone would need Raft cluster with 10s of nodes, outside of a very specific case of scaling out eventually consistent reads. Having 10s of nodes, implies that the same information is stored on every node, which may be quite a waste of space

May I recommend to look into PasificA approach? This is how Kafka works - very scalable and time proof system.

The idea behind PacificA is simple: decouple control plane from control plane. 10s of nodes are in data plane, and 3-5-or-7 are the control plane. More details description is here: https://kafka.apache.org/documentation/#design_replicatedlog

As a side node, when one has a need of a cluster with more than 7 nodes, PacificA is usually a great choice.

What major problems would I have if I use Raft Consensus Algorithm in 50+ pods?

There are 1 best solutions below

Related Questions in NODE.JS

Related Questions in DISTRIBUTED-SYSTEM

Related Questions in SYSTEM-DESIGN

Related Questions in CONSENSUS

Related Questions in RAFT

Trending Questions

Popular # Hahtags

Popular Questions