SQS Timeout with aws-sdk-go-v2 long polling

324 Views Asked by At

I have a SQS consumer:

const (
    defaultMaxNumberOfMessages = 10 // Maximum number of messages to receive (up to 10)
    defaultVisibilityTimeout   = 30 // Visibility timeout for the received messages (seconds)
    defaultWaitTimeSeconds     = 20 // Long polling with a 20-second timeout
)

func NewSQSConsumer(awsConf aws.Config, queueURL string, handler Handler) *SQSConsumer {
    httpTimeout := time.Second * (defaultWaitTimeSeconds + 5)
    awsConf.HTTPClient = awsHttp.NewBuildableClient().WithDialerOptions(func(d *net.Dialer) {
        d.Timeout = httpTimeout
    }).WithTransportOptions(func(transport *http.Transport) {
        transport.TLSHandshakeTimeout = httpTimeout
        transport.ResponseHeaderTimeout = httpTimeout
    })

    return &SQSConsumer{
        client:   sqs.NewFromConfig(awsConf),
        queueURL: queueURL,
        handler:  handler,
    }
}

func (c *SQSConsumer) Consume(ctx context.Context) error {
    received, err := c.client.ReceiveMessage(ctx, &sqs.ReceiveMessageInput{
        QueueUrl:              &c.queueURL,
        MaxNumberOfMessages:   defaultMaxNumberOfMessages,
        VisibilityTimeout:     defaultVisibilityTimeout,
        WaitTimeSeconds:       defaultWaitTimeSeconds,
        MessageAttributeNames: []string{"All"},
    })
    if err != nil {
        return err
    }

    // handle messages

}

The Consume function is called in an infinite loop, running in a goroutine.

I observe the following behaviour:

time="2023-07-05T09:18:17Z" level=error msg="operation error SQS: ReceiveMessage, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://sqs.us-east-1.amazonaws.com/\": dial tcp 3.x.x.1x:443: i/o timeout"
time="2023-07-05T09:19:37Z" level=error msg="operation error SQS: ReceiveMessage, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://sqs.us-east-1.amazonaws.com/\": dial tcp 3.x.x.1x:443: i/o timeout"
time="2023-07-05T09:20:54Z" level=error msg="operation error SQS: ReceiveMessage, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://sqs.us-east-1.amazonaws.com/\": dial tcp 3.x.x.x:443: i/o timeout"
time="2023-07-05T09:22:10Z" level=error msg="operation error SQS: ReceiveMessage, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://sqs.us-east-1.amazonaws.com/\": dial tcp 3.x.x.x:443: i/o timeout"
...
time="2023-07-05T09:22:35Z" level=error msg="operation error SQS: ReceiveMessage, failed to get rate limit token, retry quota exceeded, 0 available, 10 requested"
time="2023-07-05T09:23:00Z" level=error msg="operation error SQS: ReceiveMessage, failed to get rate limit token, retry quota exceeded, 0 available, 10 requested"
time="2023-07-05T09:23:25Z" level=error msg="operation error SQS: ReceiveMessage, failed to get rate limit token, retry quota exceeded, 0 available, 10 requested"
time="2023-07-05T09:23:50Z" level=error msg="operation error SQS: ReceiveMessage, failed to get rate limit token, retry quota exceeded, 0 available, 10 requested"

That didn't seem to help me in this case: https://stackoverflow.com/a/75772666/494826

The consumer runs in a EKS pod.

VPC Reachability analyzer says my pod can reach any of those SQS IP addresses.

Any clue?

0

There are 0 best solutions below