Longhorn Volume Metrics not being exposed

456 Views Asked by At

Volume metrics are not being exposed on /metrics endpoint on the longhorn manager

Longhorn version:1.1.2 or 1.1.1
Kubernetes version: 1.19.9-gke.1900

Node config
OS type and version: Ubuntu with Docker
Disk type : Standard persistent disk 100GB
Underlying Infrastructure : (GKE)

I have a standard GKE cluster with ubuntu and gke version 1.19.9-gke.1900

I have installed longhorn using kubectl

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.1.1/deploy/longhorn.yaml

I have tried with 1.1.2 earlier and have the same problem . If I log onto the instance manager pod and run the curl on /metrics endpoint

kubectl -n longhorn-system exec -it longhorn-manager-9d797 -- curl longhorn-manager-9d797:9500/metrics

I get this prom output

# HELP longhorn_disk_capacity_bytes The storage capacity of this disk
# TYPE longhorn_disk_capacity_bytes gauge
longhorn_disk_capacity_bytes{disk="default-disk-4cd3831f07717134",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 1.0388023296e+11
# HELP longhorn_disk_reservation_bytes The reserved storage for other applications and system on this disk
# TYPE longhorn_disk_reservation_bytes gauge
longhorn_disk_reservation_bytes{disk="default-disk-4cd3831f07717134",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 3.1164069888e+10
# HELP longhorn_disk_usage_bytes The used storage of this disk
# TYPE longhorn_disk_usage_bytes gauge
longhorn_disk_usage_bytes{disk="default-disk-4cd3831f07717134",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 5.855387648e+09
# HELP longhorn_instance_manager_cpu_requests_millicpu Requested CPU resources in kubernetes of this Longhorn instance manager
# TYPE longhorn_instance_manager_cpu_requests_millicpu gauge
longhorn_instance_manager_cpu_requests_millicpu{instance_manager="instance-manager-e-523d6b01",instance_manager_type="engine",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 113
longhorn_instance_manager_cpu_requests_millicpu{instance_manager="instance-manager-r-9d8f7ae9",instance_manager_type="replica",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 113
# HELP longhorn_instance_manager_cpu_usage_millicpu The cpu usage of this longhorn instance manager
# TYPE longhorn_instance_manager_cpu_usage_millicpu gauge
longhorn_instance_manager_cpu_usage_millicpu{instance_manager="instance-manager-e-523d6b01",instance_manager_type="engine",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 4
longhorn_instance_manager_cpu_usage_millicpu{instance_manager="instance-manager-r-9d8f7ae9",instance_manager_type="replica",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 4
# HELP longhorn_instance_manager_memory_requests_bytes Requested memory in Kubernetes of this longhorn instance manager
# TYPE longhorn_instance_manager_memory_requests_bytes gauge
longhorn_instance_manager_memory_requests_bytes{instance_manager="instance-manager-e-523d6b01",instance_manager_type="engine",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 0
longhorn_instance_manager_memory_requests_bytes{instance_manager="instance-manager-r-9d8f7ae9",instance_manager_type="replica",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 0
# HELP longhorn_instance_manager_memory_usage_bytes The memory usage of this longhorn instance manager
# TYPE longhorn_instance_manager_memory_usage_bytes gauge
longhorn_instance_manager_memory_usage_bytes{instance_manager="instance-manager-e-523d6b01",instance_manager_type="engine",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 7.29088e+06
longhorn_instance_manager_memory_usage_bytes{instance_manager="instance-manager-r-9d8f7ae9",instance_manager_type="replica",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 1.480704e+07
# HELP longhorn_manager_cpu_usage_millicpu The cpu usage of this longhorn manager
# TYPE longhorn_manager_cpu_usage_millicpu gauge
longhorn_manager_cpu_usage_millicpu{manager="longhorn-manager-9d797",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 13
# HELP longhorn_manager_memory_usage_bytes The memory usage of this longhorn manager
# TYPE longhorn_manager_memory_usage_bytes gauge
longhorn_manager_memory_usage_bytes{manager="longhorn-manager-9d797",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 2.9876224e+07
# HELP longhorn_node_count_total Total number of nodes
# TYPE longhorn_node_count_total gauge
longhorn_node_count_total 3
# HELP longhorn_node_cpu_capacity_millicpu The maximum allocatable cpu on this node
# TYPE longhorn_node_cpu_capacity_millicpu gauge
longhorn_node_cpu_capacity_millicpu{node="gke-longhorn-2-default-pool-277a6687-tjgl"} 940
# HELP longhorn_node_cpu_usage_millicpu The cpu usage on this node
# TYPE longhorn_node_cpu_usage_millicpu gauge
longhorn_node_cpu_usage_millicpu{node="gke-longhorn-2-default-pool-277a6687-tjgl"} 256
# HELP longhorn_node_memory_capacity_bytes The maximum allocatable memory on this node
# TYPE longhorn_node_memory_capacity_bytes gauge
longhorn_node_memory_capacity_bytes{node="gke-longhorn-2-default-pool-277a6687-tjgl"} 2.950684672e+09
# HELP longhorn_node_memory_usage_bytes The memory usage on this node
# TYPE longhorn_node_memory_usage_bytes gauge
longhorn_node_memory_usage_bytes{node="gke-longhorn-2-default-pool-277a6687-tjgl"} 1.22036224e+09
# HELP longhorn_node_status Status of this node
# TYPE longhorn_node_status gauge
longhorn_node_status{condition="allowScheduling",condition_reason="",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 1
longhorn_node_status{condition="mountpropagation",condition_reason="",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 1
longhorn_node_status{condition="ready",condition_reason="",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 1
longhorn_node_status{condition="schedulable",condition_reason="",node="gke-longhorn-2-default-pool-277a6687-tjgl"} 1
# HELP longhorn_node_storage_capacity_bytes The storage capacity of this node
# TYPE longhorn_node_storage_capacity_bytes gauge
longhorn_node_storage_capacity_bytes{node="gke-longhorn-2-default-pool-277a6687-tjgl"} 1.0388023296e+11
# HELP longhorn_node_storage_reservation_bytes The reserved storage for other applications and system on this node
# TYPE longhorn_node_storage_reservation_bytes gauge
longhorn_node_storage_reservation_bytes{node="gke-longhorn-2-default-pool-277a6687-tjgl"} 3.1164069888e+10
# HELP longhorn_node_storage_usage_bytes The used storage of this node
# TYPE longhorn_node_storage_usage_bytes gauge
longhorn_node_storage_usage_bytes{node="gke-longhorn-2-default-pool-277a6687-tjgl"} 5.855387648e+09

I have created a sample mysql pod with PV and I can see it being provisioned and managed by longhorn with replicas on all 3 nodes on the cluster . However I don't see these metrics https://longhorn.io/docs/1.1.0/monitoring/metrics/#volume

What am I missing here ? Any help is appreciated

2

There are 2 best solutions below

0
Vijay Sekhri On

I was able to figure this out. Apparently the metrics are only exposed from one manager instance not all of them .

0
kasten On

For anyone finding this issue via google:

Each longhorn-manager pod exposes only volume metrics about the volumes running on the same node. Therefore you need to configure your prometheus scrape_configs so that all longhorn-manager pods are scanned.

The prometheus-operator should take care of that but for manual scraping you can use something like

      - job_name: 'longhorn'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_container_name, __meta_kubernetes_pod_container_port_number]
          action: keep
          regex: 'longhorn-manager;9500'