Kubernetes liveness probe fails when the logs show a 200 response

Question

Kubernetes liveness probe fails when the logs show a 200 response

3.7k Views Asked by Akasha At 23 July 2022 at 09:28

I am using https://pypi.org/project/django-health-check/ for my health checks in a Django app run through kubernetes_wsgi with the following YAML:

        livenessProbe:
          httpGet:
            path: /ht/
            port: 8020
            httpHeaders:
              - name: Host
                value: pdt-staging.nagyv.com
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 10
        readinessProbe:
          httpGet:
            path: /ht/
            port: 8020
            httpHeaders:
              - name: Host
                value: pdt-staging.nagyv.com
          initialDelaySeconds: 20
          timeoutSeconds: 5

The pod logs claim that the probe was successful:

INFO:twisted:"-" - - [22/Jul/2022:22:11:07 +0000] "GET /ht/ HTTP/1.1" 200 1411 "-" "kube-probe/1.22"

At the same time, the pod events deny this:

Liveness probe failed: Get "http://10.2.1.43:8020/ht/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

... and after a while, the pod regularly restarts.

The pod seems to be fully functional. I can reach the /ht/ endpoint as well. Everything seems to work, except for the liveness probes.

I read about slow responses causing the issue, but this is pretty fast.

Any idea what the issue might be?

Original Q&A

There are 1 best solutions below

**SilentEntity** · Answer 1 · 2022-07-30T07:08:23.667000

python containers usually take high memory, in general to others. When you are running django behind a gunicorn, container by default utilize around 150M (for me) of your memory (depends on your amount of source code too[for us the source code was around 50M]). It too depends on the pip packages installed on the container for your app. Its good practise to provide usually 20% high memory than expected for django with gunicorn. You should also increase the timeoutSeconds to 30 or 20 according to the amount of traffic you are handling in one container. Plus , either have readinessProbe or livenessProbe on the container, both the probes will create too much noisy traffic to the container. Plus ,use HPA to scale your app, scale the app at 60% cpu and 60% memory to control burst of traffic.

As you are using threads , be careful around number of active connections on db. You are also exporting django health metrics (to prometheus) which is addition to the app functionality, remember to add extra resources for that too. prometheus scraping can also create too much overhead on the container too , so choose the number of prometheus scraping the same container and scrape_interval wisely. You don't want your container just to serve just the internal traffic.

For more relevant reference to this problem : https://github.com/kubernetes/kubernetes/issues/89898

Kubernetes liveness probe fails when the logs show a 200 response

There are 1 best solutions below

Related Questions in DJANGO

Related Questions in KUBERNETES

Related Questions in KUBERNETES-HEALTH-CHECK

Trending Questions

Popular # Hahtags

Popular Questions