Is the attempts passed to retry_on respected with multiple workers

263 Views Asked by At

When retry_on causes a delayed job to get retried, I see the run_at change on the record in the database, but attempts in the database does not get incremented until the job has been retried as many times as the attempts argument I passed to retry_on (That may be confusing. There are two different things here called attempts: the column in the db and the argument passed to retry_on).

How does this work when I have multiple workers? I see in the code for ActiveJob that the number of attempts in retry_on is being tracked through exception_executions, which appears to be just a hash stored in RAM.

Does anything prevent different workers from picking up the job on the retry?

If not, it seems like if I pass attempts: 3 to retry_on and I have 10 workers, then the job could end up getting retried as many as 30 times before it is reported as an error in DelayedJob.

Is that right? If so, is it a bug?

1

There are 1 best solutions below

0
C.M. On

In ActiveJob, it looks like both executions and exception_executions are persisted on the job instance. If you're using something like delayed_job, this is stored in the handler column on the delayed_jobs table.

When a job is created, various job data (args, provider information, etc.) is stored deserialized on the job instance. When the job is executed, this data is then serialized for use in ActiveJob as well as your queueing adapter.

This should not be an issue with multiple worker processes - each process would read from and write to this job data in ActiveJob's retry_on logic, meaning each process would be aware of how many times the jobs has executed and/or raised an exception across all processes.