I am using https://pypi.org/project/django-health-check/ for my health checks in a Django app run through kubernetes_wsgi with the following YAML:
livenessProbe:
httpGet:
path: /ht/
port: 8020
httpHeaders:
- name: Host
value: pdt-staging.nagyv.com
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
readinessProbe:
httpGet:
path: /ht/
port: 8020
httpHeaders:
- name: Host
value: pdt-staging.nagyv.com
initialDelaySeconds: 20
timeoutSeconds: 5
The pod logs claim that the probe was successful:
INFO:twisted:"-" - - [22/Jul/2022:22:11:07 +0000] "GET /ht/ HTTP/1.1" 200 1411 "-" "kube-probe/1.22"
At the same time, the pod events deny this:
Liveness probe failed: Get "http://10.2.1.43:8020/ht/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
... and after a while, the pod regularly restarts.
The pod seems to be fully functional. I can reach the /ht/ endpoint as well. Everything seems to work, except for the liveness probes.
I read about slow responses causing the issue, but this is pretty fast.
Any idea what the issue might be?
python containers usually take high memory, in general to others. When you are running django behind a gunicorn, container by default utilize around 150M (for me) of your memory (depends on your amount of source code too[for us the source code was around 50M]). It too depends on the pip packages installed on the container for your app. Its good practise to provide usually 20% high memory than expected for django with gunicorn. You should also increase the
timeoutSecondsto30or20according to the amount of traffic you are handling in one container. Plus , either havereadinessProbeorlivenessProbeon the container, both the probes will create too much noisy traffic to the container. Plus ,use HPA to scale your app, scale the app at 60% cpu and 60% memory to control burst of traffic.As you are using threads , be careful around number of active connections on db. You are also exporting django health metrics (to
prometheus) which is addition to the app functionality, remember to add extra resources for that too.prometheusscraping can also create too much overhead on the container too , so choose the number of prometheus scraping the same container andscrape_intervalwisely. You don't want your container just to serve just the internal traffic.For more relevant reference to this problem : https://github.com/kubernetes/kubernetes/issues/89898