I've been researching Raft Consensus Algorithm for a problem that I'm trying to solve, and I've decided that Raft is indeed the way to go.
However, due to the nature of the environment I'm working on, there are at least 20 pods at one time and with autoscale it goes all the way up to 100+. For simplicity let's say 50+ pods.
So, since the leader selection mechanism is quite "flexible", having more pods than the examples on the internet (3 - 5 - 7 etc) naturally makes me think doing this on a 20-50-100 pods environment is tricky/risky.
I could not find any information regarding the risks of using Raft on such scale though. I know that etcd and K8S uses Raft internally, so maybe it won't be as much of an issue as I'm thinking. So my question is, is there any potential major drawbacks to using Raft in a high scale environment with many pods?
For side-information, I'll be using this on a K8S environment with NodeJS microservices. Depending on the availability of service discovery, I'll either use simple TCP connections or use an intermediary Redis pub-sub for the communication.
Thanks!
I don't think Raft is going to be efficient with these two use cases:
May I recommend to look into PasificA approach? This is how Kafka works - very scalable and time proof system.
The idea behind PacificA is simple: decouple control plane from control plane. 10s of nodes are in data plane, and 3-5-or-7 are the control plane. More details description is here: https://kafka.apache.org/documentation/#design_replicatedlog
As a side node, when one has a need of a cluster with more than 7 nodes, PacificA is usually a great choice.