How to design a long running process that can continue after an outtage?

22 Views Asked by Bhoomtawath Plinsut At 09 February 2024 at 04:17

How to can I make a long running job that fail in the middle of the process continue from the last successful operation?

For example, there's a service that has to notify 1 million users via AWS SNS. The service would have to send a request to SNS one by one for each user. If the service die while trying to notify the 999999th user, then how can I make the restarted service start processing from the last 2 users?

My idea is to use Redis for idempotency. So, it will only notify each user exactly once. The whole operation qs treated as a message on a queue.

The processing service will

Receive a message to notify users
Query users that match the criteria of the job.
Check if the user id is more than the user id on Redis. 2.1 If less than Redis then skip. 2.2 If more than the id on Redis then send a SNS notification for the user. 2.3 Updates the user id on Redis.
Continue to the next user.
Once the job is completed, ACK the message.

This solution seems to work, but after sending a notification for a user, then it could fail while trying to update Redis and cause the user to be notified multiple times.

Original Q&A

How to design a long running process that can continue after an outtage?

There are 0 best solutions below

Related Questions in DISTRIBUTED-SYSTEM

Trending Questions

Popular # Hahtags

Popular Questions