I have a prometheus metrics with type counter. it is called job_failure_counter: if a job fails, the application will increment the metrics by one and send it to pushgateway of prometheus, otherwise it will not send anything to pushgateway.
As you can imagine, the graph of the metrics shown in prometheus is a discontinuous line. i.e. Only when job fails there is a short line. But most of time job does not fail so you do not see line or point.
I created an alert to detect job failure: if rate (the counter[5m]) > 0, trigger alert.
However it does not work because most of time, there is not data, so the value of rate is empty query result. and even when there is data, there is no data in previous minute so still we cannot get rate.
How to get alert in this case? I am thinking using this logic:
If current data exists:
if date in previous minutes exists && the previous data != the current data => trigger alert
elif date in previous minutes does not exist => trigger alert
else:
do not trigger alert
This logic sound complex, I do not know how to add it to alert rule.
Question: how to create alert for job failure if data is discontinuous