I have a batch job that runs every 15 minutes. Most of the time runs in less than a minute, but sometimes it may take an hour.
What I have right now is a Jenkins that executes a batch job with the concurrency at 1.
I'm looking to migrate into AWS Batch, with scheduled jobs however, I don't see any easy way to have a lock system in which I can make sure only one instance of the job is run at the same time.
Ideally it would also not queue a new job if the previous one has not finished running, but that's less important.
I also have around 20-30 different jobs with the same requirements but with different schedules and timeouts.
The options I see possible are:
- Lock system in a database, like DynamoDB. When the job starts, checks that and if it's locked, it doesn't run. And the lock is cleared after the timeout of the job in case the job doesn't finish or breaks.
- One queue for each batch job, and each job requires the same CPU and Memory as the job environment. This should theoretically limit the executions to 1. However creating a new queue for each job seems cumbersome.
Is there any other easier option?
Note that AWS Batch is not the same as ECS Scheduled tasks
I would set up a Step Function workflow for your logic and if the pre-flight checks pass submit the job from the Step Functions workflow. The checks should be handled by Lambda functions. This keeps scheduling logic outside of your application code. DDB is great for keeping track of which job types are running.
Last thing -> for keeping track of a job SUCCEEDED or FAILED status, instead of polling the Batch API you should leverage the Batch job state change CloudWatch events and EventBridge events to clear the lock in DDB.