What service quotas could prevent scaling AWS Batch?

31 Views Asked by At

I need to run 500 jobs, but they keep getting stuck in Runnable. When my jobs do start, they run to completion, so there is nothing wrong with their configuration. Maybe it's a us-east-1 capacity issue or maybe it's the service quotas tied to my account.

What service quotas apply in this scenario? Below are default quotas as candidate suspects. I have submitted requests to increase each of these.

Service Quota name Applied account-level quota value AWS default quota value
EC2 Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances 32 CPUs 5
EC2 All Standard (A, C, D, H, I, M, R, T, Z) Spot Instance Requests 32 CPUs 5
EC2 New Reserved Instances per month 50 20
EC2 EC2-VPC Elastic IPs 5 5

(Batch jobs require a public IP to talk to ECR)


Configuration:

  • AWS Batch - Compute Environment - Max vCPUs: 5000
  • AWS Batch - Job Definition - Fargate - vCPUs: 4
  • AWS Batch - Job Definition - Fargate - Memory: 8GB
  • AWS Batch - Job Definition - Fargate - Ephemeral Storage: 100GB

enter image description here

1

There are 1 best solutions below

0
Kermit On

The quotas in question were:

  • Fargate On-Demand vCPU resource count
  • Fargate Spot vCPU resource count

As shown by the "utilization" on the right

enter image description here

At the time, I had dropped down to 2 vCPU per job in an attempt to trying to get more instances:

15 running x 2 vCPU = 30 vCPU quota

It is infuriating that the services enforce quotas, but don't inform the user when they are being enforced.