AWS Batch in Privileged mode urllib3.exceptions.ConnectTimeoutError + botocore.exceptions.ConnectTimeoutError

17 Views Asked by At

My AWS Batch job in privileged mode has the following issue with boto/botocore:

TimeoutError: timed out
The above exception was the direct cause of the following exception:

urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPConnection object at 0x7f858aa9b700>, 'Connection to 169.254.170.2 timed out. (connect timeout=2)')

botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "http://169.254.170.2/v2/credentials/f379b1f3-1673-43b3-9ae7-523b2534be77"

botocore.exceptions.MetadataRetrievalError: Error retrieving metadata: Received error when attempting to retrieve container metadata: Connect timeout on endpoint URL: "http://169.254.170.2/v2/credentials/f379b1f3-1673-43b3-9ae7-523b2534be77"

botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received error when attempting to retrieve container metadata: Connect timeout on endpoint URL: "http://169.254.170.2/v2/credentials/f379b1f3-1673-43b3-9ae7-523b2534be77"
  • Roles and Policies look fine
  • Security groups allows all outbound traffic

What's wrong?

1

There are 1 best solutions below

0
Vincent Claes On

Add this to your Dockerfile

RUN update-alternatives --set iptables /usr/sbin/iptables-legacy
RUN update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy

The issue stemmed from the mix-up between different iptables versions (legacy and nftables).

This complication arises within AWS Batch when deploying container images that default to nftables for iptables, such as those based on Ubuntu 22.04.

In our case, the AWS Batch utilized container image was set up to initiate a docker-in-docker upon startup, with the --privileged flag activated to facilitate this operation.

The internal utilization of iptables by the docker daemon prompts the loading of nftables onto the host OS kernel, disrupting the established legacy iptables configurations, including port forwarding, which the AWS ECS Agent relies on.

source: https://repost.aws/questions/QUCFqv7OfoQlygJrmwfkJ24Q/various-aws-apis-fail-due-to-timeout