Task takes too much time pending on ECS

1.2k Views Asked by At

I've been with a weird problem for some days. I'm implementing the ECS logic to drain instances on termination (specifically on Spot interruption notice) using the ECS_ENABLE_SPOT_INSTANCE_DRAINING=true env var on the ecs-agent.

The process works fine, when an interruption notice arrives, ECS drains the instance and moves the containers to another one, but here is the problem, if the instance never started that image before, it takes too much time to start (About 3 min, when the spot interruption time is in 2 min) causing availability issues. If the image started in that instance before, it only takes 20 sec to spin up the task!

Have you experienced this problem before using ECS?

PD: The images are about 500MB is that large for an image??

1

There are 1 best solutions below

0
Shoan On

There are some strategies available to you:

  1. Reduce the size of the image by optimising the Dockerfile. A smaller image is quicker to pull from the repository.
  2. Bake the large image into the AMI used in the cluster. Now every new spot machine will have the image. Depending on how the Dockerfile is created, a significant number of layers could be reused resulting on quicker image pulls.

Once the image is pulled to the machine, the image is cached and subsequent pulls will almost be instantaneous.