Resubmit failed condor jobs

320 Views Asked by At

When submitting condor jobs, typically a few or more jobs can fail for unknown reasons, and these jobs have to be resubmitted. so I was wondering: What's the most efficient way of resubmitting failed condor jobs? i.e. with having to fish one by one and resubmit them

I tried to grep all the failed messages and extract the job id, but it's time consuming to manipulate

1

There are 1 best solutions below

2
Greg On

How is the job failing? If it fails with a non-zero exit code, try setting

num_retries = 5

in your condor_submit file. That way, if the job exits with a non-zero exit code, condor will re-run it up to five times until it does exit zero.