Redis Lettuce not distributing requests to all slave instances in v5.1

51 Views Asked by At

I'm currently using lettuce client in micronaut for redis connection. The current version is 5.1 where in the lettuce-core library being used is 6.1.6-Release.

I have a high tps application where in the redis response time was close to 1 sec. The connection timeout set is 180ms so I ruled out the fact that the connection was taking time.

looking into this issue, I found through redis_thread1 that lettuce caches the slave candidate for that master making all the load to be directed to one slave instance only. We have 5 Master nodes and 2 slaves each amounting to 15 instances between 5 clusters. Ideally if we have multiple slaves for the same master, all the read request load for any Master must be equally distributed among it's slaves was my assumption.

The solution fixed by the lettuce team was to randomise slave candidates which seems to be implemented in lettuce version 5.2.0.

So I'm trying to understand the issue: Since lettuce uses reactive processes, the application threads do not wait to get a redis connection, hence the connection-timeout doesn't kick in. But since the redis operation happens one request at a time per slave, all the threads will be waiting for the resource after getting the connection and so the response time in our metrics is pretty high. It would be really helpful if anyone could validate my assumption.

Things I tried:

I went through this PR to allow randomization of slave candidates: https://github.com/lettuce-io/lettuce-core/commit/bdf304bd308f300b7535b33b93db8f8472dd8f33.

went through slow logs at redis server end to make sure there is issue at redis end and as expected no query was taking too long to be executed.

Expectation: Since the documentation says that the changes are present in version 5.2, I was surprised to see the code present in version 5.1 itself.

Is this because it is pulling the latest library of lettuce-core since all code changes is in the core library?

Since we read only from slaves, I want to set readFrom as ReadFrom.Any_Replica. I did a perf test and it seems normal but due to the lack of documentation and resources regarding this, I wanted to know if there are any issues I need to look into while using this ReadFrom: With higher tps will the CPU overhead of choosing the slave candidate be considerably high?

0

There are 0 best solutions below