I have a gauge metric badness which goes up when my service is performing poorly. There is one gauge per instance of the service and I have many instances.
I can take a max over all instances so that I can see how bad the worst instance is:
max(badness)
This graph is noisy because the identity of the worst instance, and how bad it is, changes frequently. I would like to smooth it out by applying a moving average. However, this doesn't work (I get a PromQL syntax error):
avg_over_time(max(badness)[1m])
How can I apply avg_over_time() to a timeseries that has already been aggregated with the max() operator?
My backend is VictoriaMetrics so I can use either MetricsQL or pure PromQL.
The
avg_over_time(max(process_resident_memory_bytes)[5m])query works without issues in VictoriaMetrics. It may fail if you use promxy in front of VictoriaMetrics, sincepromxydoesn't support MetricsQL - see this issue for details.The query can be fixed, so it may work in Prometheus and promxy - just add a colon after
5min square brackets:This is named subquery in Prometheus world. See mode details about subquery specifics in VictoriaMetrics and Prometheus in this article