Is it normal for AWS CloudWatch alarms metrics to be stuck at some value?

31 Views Asked by At

I have an issue where I've set up an alarm which measures ApproximateNumberOfMessagesVisible on an SQS-queue. When it reaches 1, it spins up an EC2 instance, and when it goes to 0 it should trigger AutoScaling group to downscale it. However, even though I have 0 messages on SQS queue, my alarm remains at 1 message with the state Ok, it's like it doesn't recognize the change. I know it's an 'approximate number of messages', but this can go on for days with the value of messages 0, and the actual value on alarm as 1. Is there any clue on why is this happening?

1

There are 1 best solutions below

1
John Rotenstein On

I have also observed that Amazon SQS does not send "zero" metrics to Amazon CloudWatch. My personal theory is that this reduces the amount of 'chatter' that SQS needs to send to CloudWatch because there must be a huge number of empty queues.

Fortunately, you can cater for this by going to the Additional Configuration in the CloudWatch Alarm and setting Missing data treatment to Treat missing data as bad. This will trigger the alarm in the same manner as if you set it to trigger on a zero value.

CloudWatch Alarm

When configuration scaling based on SQS queue sizes, remember the golden rule to Scale-out quickly but scale-in slowly. That is, turn on resources reasonably quickly to respond to incoming messages but don't be too hasty to scale-in. Set the Alarm to only trigger after a reasonable period of time relative to how long it takes to launch new resources. So, if you are launching EC2 instances and an instance takes 2 minutes to launch, wait at least 5 minutes to scale-in.