kubernetes ingress-nginx | Readiness probe failed |

75 Views Asked by At

Hi you all I'm ending the installation of a brand new cluster and I'm facing a strange issue.

I'm deploying the ingress-nginx both via manifest and helm chart but they give me the same result

kubectl get po 
nginx-ingress-dx6bg             0/1     Running             3 (26s ago)     3m44s   10.244.4.118   node-2   <none>           <none>
nginx-ingress-gqkhz             0/1     Running             3 (29s ago)     3m47s   10.244.3.16    node-1   <none>           <none>
nginx-ingress-dx6bg             0/1     Error               3 (86s ago)     4m44s   10.244.4.118   node-2   <none>           <none>
nginx-ingress-gqkhz             0/1     Error               3 (89s ago)     4m47s   10.244.3.16    node-1   <none>           <none>
nginx-ingress-dx6bg             0/1     CrashLoopBackOff    3 (12s ago)     4m56s   10.244.4.118   node-2   <none>           <none>
nginx-ingress-gqkhz             0/1     CrashLoopBackOff    3 (13s ago)     4m59s   10.244.3.16    node-1   <none>           <none>
nginx-ingress-gqkhz             0/1     Running             4 (44s ago)     5m30s   10.244.3.16    node-1   <none>           <none>
nginx-ingress-dx6bg             0/1     Running             4 (51s ago)     5m35s   10.244.4.118   node-2   <none>           <none>
nginx-ingress-b9fcfbb59-hwjc8   0/1     Running             6 (2m49s ago)   12m     10.244.4.116   node-2   <none>           <none>

and describing the pod the issue is in the liveness probes

kd po -n nginx-ingress nginx-ingress-b9fcfbb59-hwjc8
Name:             nginx-ingress-b9fcfbb59-hwjc8
Namespace:        nginx-ingress
Priority:         0
Service Account:  nginx-ingress
Node:             node-2/192.168.17.15
Start Time:       Thu, 08 Feb 2024 17:09:37 +0100
Labels:           app=nginx-ingress
                  app.kubernetes.io/name=nginx-ingress
                  app.kubernetes.io/version=3.4.2
                  app.nginx.org/version=1.25.3
                  pod-template-hash=b9fcfbb59
Annotations:      <none>
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               10.244.4.116
IPs:
  IP:           10.244.4.116
Controlled By:  ReplicaSet/nginx-ingress-b9fcfbb59
Containers:
  nginx-ingress:
    Container ID:  containerd://57299408237d9d8b1b7be67ac12d6999640ff2249305c8d289a78a58fe6b38c9
    Image:         nginx/nginx-ingress:3.4.2
    Image ID:      docker.io/nginx/nginx-ingress@sha256:4b97f1d3466c804d51abbdeb84f2c7c3ea00d6a937a320d62a4cf9d6b447d6ad
    Ports:         80/TCP, 443/TCP, 8081/TCP, 9113/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      -nginx-configmaps=$(POD_NAMESPACE)/nginx-config
    State:          Running
      Started:      Thu, 08 Feb 2024 17:17:51 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Thu, 08 Feb 2024 17:15:30 +0100
      Finished:     Thu, 08 Feb 2024 17:16:30 +0100
    Ready:          False
    Restart Count:  5
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:readiness-port/nginx-ready delay=0s timeout=1s period=1s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:  nginx-ingress (v1:metadata.namespace)
      POD_NAME:       nginx-ingress-b9fcfbb59-hwjc8 (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vlfd8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-vlfd8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                      From               Message
  ----     ------     ----                     ----               -------
  Normal   Scheduled  8m57s                    default-scheduler  Successfully assigned nginx-ingress/nginx-ingress-b9fcfbb59-hwjc8 to node-2
  Normal   Pulling    8m57s                    kubelet            Pulling image "nginx/nginx-ingress:3.4.2"
  Normal   Pulled     8m35s                    kubelet            Successfully pulled image "nginx/nginx-ingress:3.4.2" in 21.588s (21.589s including waiting)
  Normal   Created    8m35s                    kubelet            Created container nginx-ingress
  Normal   Started    8m35s                    kubelet            Started container nginx-ingress
  Warning  Unhealthy  3m56s (x250 over 8m34s)  kubelet            Readiness probe failed: Get "http://10.244.4.116:8081/nginx-ready": dial tcp 10.244.4.116:8081: connect: connection refused

according to the known issue to nginx corp i instructed helm to rise the timeouts but without any positive result.

helm install nginx-ingress-controller nginx-stable/nginx-ingress  --set rbac.create=true --set controller."nodeSelector\.kubernetes\.io/hostname"=node-2 --set nginxReloadTimeout=20000

Do you have any suggestion? possibly without resetting the whole cluster?

on a different cluster it worked correctly.

1

There are 1 best solutions below

0
hasouscsed On

Looking at it purely from a deployment perspective - First rule out resource issue. Do a describe on the stopped nginx-ingress-gqkhz or nginx-ingress-dx6bg replicas and check the error. Also suggest scaling it down to 1 or 2 replicas and see if the container starts. Readiness probe failing doesnt indicate much.

Also, on the container it shows as running, read the logs (kubectl logs podname containername). That might give you some info.

Although i see CrashLoopBackOff on some replicas, I have to rule out any network issue since some of the replicas have pulled the image.