How to ensure RabbitMQ message is not duplicated when handling concurrent messages in multiple pods?

258 Views Asked by At

I have a .NET worker service that processes jobs using RabbitMQ messages. I use the EasyNetQ NuGet package for message handling. There are multiple such worker services distributed among a number of Kubernetes pods. Once it receives a RabbitMQ message, it is not acknowledged until the handler method is finished, so if a pod terminates in the middle of executing its handler method, the RabbitMQ message will be resent.

Some messages are handled in parallel to reduce the time it takes for each job to complete. enter image description here

When the message JobStarted is fired, 4 handlers (PartOnehandler, PartTwoHandler, PartThreehandler, and PartFourHandler) begin processing different tasks. When each part is finished, it sets a respective state in the database (IsPartOneFinished, IsPartTwoFinished, IsPartThreeFinished, and IsPartFourFinished) SecondEntityHandler cannot start processing its task until all 4 parts are completed.

At first, I coded PartFinishedHandler to check if all four states were set to true and then fire the AllFirstEntityPartsFinished message, however that resulted in the AllFirstEntityPartsFinished message being fired multiple times because multiple part handlers may send the PartFinished message at almost the same time. Then I made a fifth database state field called AllPartsCompleted. PartFinishedHandler executes a single database query:

UPDATE Jobs SET AllPartsCompleted = 1 WHERE JobId = id AND IsPartOneFinished = 1 AND IsPartTwoFinished = 1 AND IsPartThreeFinished = 1 AND IsPartFourFinished = 1 AND AllPartsCompleted = 0

It then checks if the affected row count is not 0.

This successfully fixed AllFirstEntityPartsFinished from being fired multiple times, however, I noticed that in very rare cases when the pod is terminated in the middle of PartFinishedHandler execution (after setting AllFirstEntityPartsFinished to 1, but before publishing the AllFirstEntityPartsFinished message), the job gets stuck. RabbitMQ resends the PartFinished message, but PartFinishedHandler ignores it because AllFirstEntityPartsFinished is already set to true.

How can I keep these four parts parallelized while also ensuring that AllFirstEntityPartsFinished message is fired only once and that it does not get stuck in PartFinishedHandler if the pod is terminated in the middle of PartFinishedHandler?

0

There are 0 best solutions below