I have an application that consumes messages from a kafka topic. The liveliness of the consuming loop is check by a Yammer guage called llcPartitionConsuming which is set to 1 whenever the consumption happens, and to 0 when the messages are being processed.
The problem I am facing is that when the consuming loop dies for some reason, this metric is not reported at all and any alerts I have on this metric are not evaluated. To detect the absence of this metric, I am using:
absent(llcPartitionConsuming) == 1 in my alert rule defn, but this is triggered only when all the consuming loop dies on ALL servers (that is, when Prometheus does not find a time series for this metric AT ALL). However, I want an alert even when the consuming loop dies on a single server (this metric has server and partition as labels). How can I do that?