I have otel collector running on each k8s node (EKS) as daemonset and has exporter set as datadog. I noticed when metric is shipped from datadog otel exporter, its duplicated. Because in the host tag it looks like that, its tagged not just for the pod but also for the instance from where this metric is being reported. Same metric when shipped directly from datadog agent instead of using otel collector its reporter correctly.
Example from datadog metric :
| TAG KEY | COUNT | TAG VALUES |
| --------| ------|----------------------- |
| host | 8 | host:i-XXXXXXX |
| | | host: Y-pod-server-0 |
| | | host:i-XXXX |
| | | host:i-XXXX |
| | | host:Y-pod-server-2 |
| | | host:i-XXXXX |
| | | host:Y-pod-server-3 |
| | | host:Y-pod-server-1 |
Ideally it would have been just 4. When I use datadog agent to export metric directly instead of otel collector, host tag looks like below in datadog :
| TAG KEY | COUNT | TAG VALUES |
| --------| ------|----------------------- |
| host | 4 |
| | | host: Y-pod-server-0 |
| | | host:Y-pod-server-2 |
| | | host:Y-pod-server-3 |
| | | host:Y-pod-server-1 |
Below is my otel collector config map :
kind: ConfigMap
metadata:
annotations:
kubernetes.io/description: Contains a CA bundle that can be used to verify the
kube-apiserver when using internal endpoints such as the internal service
IP or kubernetes.default.svc. No other usage is guaranteed across distributions
uid: 22cd0904-b90c-4123-92b3-9775318e1638
- apiVersion: v1
data:
relay: |
exporters:
datadog:
api:
key: ${env:DD_API_KEY}
site: datadoghq.com
host_metadata:
enabled: true
hostname_source: first_resource
debug: {}
extensions:
health_check: {}
memory_ballast:
size_in_percentage: 40
processors:
resourcedetection/eks:
detectors: [env, eks]
timeout: 2s
override: true
attributes/k8s_cluster_name:
actions:
- action: upsert
key: k8s.cluster.name
value: devel_plane_regional_us-west-2
batch: {}
k8sattributes:
extract:
labels:
- from: pod
key_regex: (.*)
tag_name: $$1
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
filter:
node_from_env_var: K8S_NODE_NAME
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25
receivers:
hostmetrics:
collection_interval: 10s
root_path: /hostfs
scrapers:
cpu: null
disk: null
filesystem:
exclude_fs_types:
fs_types:
- autofs
- binfmt_misc
- bpf
- cgroup2
- configfs
- debugfs
- devpts
- devtmpfs
- fusectl
- hugetlbfs
- iso9660
- mqueue
- nsfs
- overlay
- proc
- procfs
- pstore
- rpc_pipefs
- securityfs
- selinuxfs
- squashfs
- sysfs
- tracefs
match_type: strict
exclude_mount_points:
match_type: regexp
mount_points:
- /dev/*
- /proc/*
- /sys/*
- /run/k3s/containerd/*
- /var/lib/docker/*
- /var/lib/kubelet/*
- /snap/*
load: null
memory: null
network: null
jaeger:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:14250
thrift_compact:
endpoint: ${env:MY_POD_IP}:6831
thrift_http:
endpoint: ${env:MY_POD_IP}:14268
otlp:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
endpoint: ${env:MY_POD_IP}:4318
prometheus:
config:
scrape_configs:
- job_name: opentelemetry-collector_sp_regional_us-west-2
scrape_interval: 10s
static_configs:
- targets:
- ${env:MY_POD_IP}:8888
- job_name: sp_control_plane_regional_us-west-2
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_custom_telemetry
- source_labels:
- __meta_kubernetes_pod_container_port_name
regex: metric
action: keep
- regex: __meta_kubernetes_pod_node_name
action: labeldrop
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
scrape_interval: 10s
zipkin:
endpoint: ${env:MY_POD_IP}:9411
service:
extensions:
- health_check
- memory_ballast
pipelines:
logs:
exporters:
- debug
processors:
- k8sattributes
- memory_limiter
- batch
receivers:
- otlp
metrics:
exporters:
- otlp/newrelic
- datadog
- debug
processors:
- k8sattributes
- memory_limiter
- attributes/k8s_cluster_name
- batch
receivers:
- prometheus
- hostmetrics
traces:
exporters:
- debug
processors:
- k8sattributes
- memory_limiter
- batch
receivers:
- otlp
- jaeger
- zipkin
telemetry:
metrics:
address: ${env:MY_POD_IP}:8888
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: otel-collector
meta.helm.sh/release-namespace: otel-collector
creationTimestamp: "2023-10-11T23:32:52Z"
labels:
app.kubernetes.io/instance: otel-collector
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-collector
app.kubernetes.io/version: 0.86.0
helm.sh/chart: opentelemetry-collector-0.69.2
name: otel-collector-opentelemetry-collector-agent
namespace: otel-collector
kind: List
metadata: {}
- In debug log of otel collector I have noticed, below log line
2023-10-12T18:30:11.678Z info provider/provider.go:59 Resolved source {"kind": "exporter", "data_type": "metrics", "name": "datadog", "provider": "ec2", "source": {"Kind":"host","Identifier":"i-XXXXXXXXX"}}
I am wondering is it because datadog is unable to detect, that its running on EKS. Reason why its reporting metric instance id as host instead of just pods.
For the same, tried setting host_metadata field in datadog exporter, and set it to first_resource instead of setting to config_or_system but it has changed nothing :
host_metadata:
enabled: true
hostname_source: first_resource
- Also tried to drop metric with label
pod_node_nameusingkubernetes_sd_configof prometheus but doesnt look to be working either :
- regex: __meta_kubernetes_pod_node_name
action: labeldrop