What is the possible reason that the taint and tolerations not work as I expect in EKS

432 Views Asked by At

I am working with EKS 1.24 version, and created 2 node groups in EKS: groupA and groupB. GroupB is with taint "dedicated:druid:NoSchedule", but the pods without tolerations "dedicated Equal druid NoSchedule" are also scheduled to groupB, what is the possible reason?

My expectation is only the pods with toleration "dedicated Equal druid NoSchedule" are scheduled to groupB

1

There are 1 best solutions below

0
Mars On

I had the same problem again in production, but after I restarted all pods several times, all pods were restored to the correct worker nodes.

Then I noticed something weird, every time I found pods on incorrect worker nodes, they were created very close together.

So I guess that if pods and worker nodes start at the same time, before eks has not marked the taint on the worker node, the pod maybe put into the worker node with the mismatching taint.

I tried some things to solve this problem and it works in my environment:

  1. Set the nodeSelector or nodeAffinity on pod, then pod will check node whether have the match label before placed into the work node
  2. Change the effect to NoExecute in taint and toleration (if the pod does not match the label, it will be evicted to other worker nodes)

Hope those informations help you resolve your issue.