Here is a sample "rules.yml" file used to monitor a node (Substrate node on Ubuntu 22.04) using Prometheus:
groups:
- name: alert_rules
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance $labels.instance down"
description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."
- alert: HostHighCpuLoad
expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
for: 0m
labels:
severity: warning
annotations:
summary: Host high CPU load (instance bLd Kusama)
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
I can't figure out how the variable $labels.instance is resolved. Where do I set this? Apparently, the generated alerts put out literal "$lables.instance" instead of the a value representing the instance name.
I went through the prometheus config file and not sure where to set that value. Anyone could point me in the right direction be a great help. Thanks.
Problem with
lies in this line:
For value to be properly substituted it should be in format {{ $labels.instance }}
annotations are generated using labels and value of time series which triggered firing of alert (in this case
up).