We're using:
- Standard Redis on Azure
- StackExchange.Redis
- RedLock.net
Our website has grown significantly over the last year or two, now serving ~250,000,000 uncached requests per month according to Cloudflare.
Sporadically, we see a couple of hundred exceptions in bursts relating to RedLock not being able to aquire a lock because it is in Conflicted
state.
Our Redis cache typically:
- Runs at 10% server load (I beleive this is regarding CPU)
- But running close to 100% memory usage
My questions are:
- Is it recommended practise to have an entirely different Redis server for locking?
- Could using 100% memory in the Redis server cause issues when creating the locks?
When you look at your cache performance metrics, do the failures coincide with 100% memory usage? If so, I'll bet that's the culprit.
When Redis hits 100% memory, page faulting can occur, which slows down requests. See here for a description of the process. I could envision where a five ms Redlock.net time limit to acquire a lock would expire when memory pressure hits 100% and requests are delayed.
I'd spin up a second Redis server just for locking and see if it alleviates the problem, OR scale up your existing cache. See if you still experience the issue. The scale up would likely be the easiest experiment without having to make changes to your code.