GKE Clusters no indication within metrics or logs after failure

46 Views Asked by At

We are using Google Cloud Platform and our GKE clusters has had multiple "node auto repair" events occur within the last 48 hours, and we can find no indication within the metrics, nor in the logs, that might explain why the control plane required it to do so.

How can I debug this ?

No support from Google team

2

There are 2 best solutions below

2
Nani On

It may be worth to check out this answer

Also, by any chance if you have enabled master logs for your cluster check with NodeNotReady keyword or track node's different metrics in metric explorer.

I have came across many situations where users fail to apply resources limit and that results in such node behaviour

0
dany L On

Try to run the gcloud command below to see the repair events

gcloud container operations list