Pods in Kubernetes can't see each other (Temporary failure in name resolution ,even for kubernetes.default.svc.cluster.local)

Question

Pods in Kubernetes can't see each other (Temporary failure in name resolution ,even for kubernetes.default.svc.cluster.local)

143 Views Asked by Behnia FB At 31 January 2024 at 05:15

I deployed a Fluent-bit daemonSet, & an Elasticsearch (with a NodePort service) using these manifests:

fluent-bit-daemonset.yml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  labels:
    k8s-app: fluent-bit-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  selector:
    matchLabels:
      k8s-app: fluent-bit-logging
      version: v1
      kubernetes.io/cluster-service: "true"
  template:
    metadata:
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:0.14.2
        ports:
        - containerPort: 2020
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch-0.elasticsearch.default.svc.cluster.local"
        - name: FLUENT_ELASTICSEARCH_PORT 
          value: "9200"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: mnt
          mountPath: /mnt
          readOnly: true
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      - name: mnt
        hostPath:
          path: /mnt
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

elasticsearch.yml:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app : elasticsearch
    component: elasticsearch
    release: elasticsearch
  name: elasticsearch
spec:
  podManagementPolicy: Parallel
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app : elasticsearch
      component: elasticsearch
      release: elasticsearch
  serviceName: elasticsearch
  template:
    metadata:
      creationTimestamp: null
      labels:
        app : elasticsearch
        component: elasticsearch
        release: elasticsearch
    spec:
      containers:
      - env:
        - name: cluster.name
          value: dev-cluster
        - name: discovery.type
          value: single-node
        - name: ES_JAVA_OPTS
          value: -Xms512m -Xmx512m
        - name: bootstrap.memory_lock
          value: "false"
        - name: xpack.security.enabled
          value: "false"
        image: elasticsearch:8.12.0
        imagePullPolicy: IfNotPresent
        name: elasticsearch
        ports:
        - containerPort: 9200
          name: http
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        resources:
          limits:
            cpu: 1
            memory: 3Gi
          requests:
            cpu: 250m
            memory: 512Mi
        securityContext:
          privileged: true
          runAsUser: 1000
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: elasticsearch-data
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - chown -R 1000:1000 /usr/share/elasticsearch/data
        - sysctl -w vm.max_map_count=262144
        - chmod 777 /usr/share/elasticsearch/data
        - chomod 777 /usr/share/elasticsearch/data/node
        - chmod g+rwx /usr/share/elasticsearch/data
        - chgrp 1000 /usr/share/elasticsearch/data
        image: busybox:1.29.2
        imagePullPolicy: IfNotPresent
        name: set-dir-owner
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: elasticsearch-data
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 10
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - metadata:
      creationTimestamp: null
      name: elasticsearch-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi

elastic-service.yml :

---
apiVersion: v1
kind: Service
metadata:
  name: eks-srv
spec:
  selector:
    app: elasticsearch
    component: elasticsearch
  ports:
    - name: db
      protocol: TCP
      port: 9200
      targetPort: 9200
    - name: monitoring
      protocol: TCP
      port: 9300
      targetPort: 9300
  type: NodePort

But when I log the Fluent-bit pod, it gives the following: k logs pods/fluent-bit-bwcl6

[2024/01/31 04:58:05] [ info] [engine] started (pid=1)
[2024/01/31 04:58:06] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2024/01/31 04:58:06] [ info] [filter_kube] local POD info OK
[2024/01/31 04:58:06] [ info] [filter_kube] testing connectivity with API server...
[2024/01/31 04:58:06] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:58:06] [error] [filter_kube] upstream connection error
[2024/01/31 04:58:06] [ warn] [filter_kube] could not get meta for POD fluent-bit-bwcl6
[2024/01/31 04:58:06] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2024/01/31 04:58:11] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:58:11] [error] [filter_kube] upstream connection error
[2024/01/31 04:58:21] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:58:21] [error] [filter_kube] upstream connection error
[2024/01/31 04:58:21] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:58:21] [error] [filter_kube] upstream connection error
[2024/01/31 04:58:31] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:58:31] [error] [filter_kube] upstream connection error
getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:58:41] [error] [filter_kube] upstream connection error
[2024/01/31 04:58:51] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:58:51] [error] [filter_kube] upstream connection error
[2024/01/31 04:59:16] [ warn] net_tcp_fd_connect: getaddrinfo(host='elasticsearch-0.elasticsearch.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:59:21] [ warn] net_tcp_fd_connect: getaddrinfo(host='elasticsearch-0.elasticsearch.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:59:27] [ warn] net_tcp_fd_connect: getaddrinfo(host='elasticsearch-0.elasticsearch.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:59:37] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:59:37] [error] [filter_kube] upstream connection error
[2024/01/31 04:59:37] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:59:37] [error] [filter_kube] upstream connection error
[2024/01/31 04:59:47] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:59:47] [error] [filter_kube] upstream connection error
[2024/01/31 04:59:47] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc.cluster.local'): Temporary failure in name resolution
[2024/01/31 04:59:47] [error] [filter_kube] upstream connection error
[2024/01/31 04:59:57] [ warn] net_tcp_fd_connect: getaddrinfo(host='elasticsearch-0.elasticsearch.default.svc.cluster.local'): Temporary failure in name resolution

.
.
.

Why it can't see the elasticsearch?

Both are deployed in same namespace (default).

Original Q&A

There are 2 best solutions below

Kapil Sakhare On 31 January 2024 at 09:54

It seems you might be encountering connectivity issues with the Kubernetes API server. This could be the reason you are not able to see the elasticsearch in the same name space.

According to the Fluent Bit Official Troubleshooting document, check Kubernetes specific roles, enable IPv6 and connectivity to Kube_URL.

“When Fluent Bit is deployed as a DaemonSet it generally runs with specific roles that allow the application to talk to the Kubernetes API server. If you are deployed in a more restricted environment check that all the Kubernetes roles are set correctly.” Service Accounts are used to authenticate pods to other services. Check the pods have the correct service account permissions to access the other pods.

Also check this Github issue related to connectivity to KUBE_URL for insights from the community.

**Behnia FB** · Accepted Answer · 2024-02-06T15:55:05.150000

The problem was:

The DNS of my kubernetes (or maybe, my whole VM's DNS) was having problem. So, I:
1. Deleted node-local-dns pods, because they cached the wrong DNS servers for my node (I had a single node cluster, so I didn't need node-local-dns as a cache for my node)
2. Set the correct NEW dns in CoreDNS manifest. This solved the problem of not recognizing kubernetes.default.svc.cluster.local
I had another problem, which led to not recognizing the elasticsearch: As you can see in my fluent-bit's manifest, I had set:

env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch-0.elasticsearch.default.svc.cluster.local"

This is WRONG. It should be the service name of elasticsearch (in my case, it was eks-srv as you can see in elastic-service.yml:

value: "eks-srv.default.svc.cluster.local"

And now my fluent-bit, sees the elasticsearch, through eks-srv service.

Pods in Kubernetes can't see each other (Temporary failure in name resolution ,even for kubernetes.default.svc.cluster.local)

There are 2 best solutions below

Related Questions in ELASTICSEARCH

Related Questions in KUBERNETES

Related Questions in NETWORKING

Related Questions in KUBERNETES-NETWORKING

Trending Questions

Popular # Hahtags

Popular Questions