I have a number of jobs running across multiple nodes within the cluster. Randomly, the pods begin terminating and a new pod is created by the job.
Each pod has a node affinity assigned to it, using preferredDuringSchedulingIgnoredDuringExecution and a weight. Is it possible for the node affinity to cause the pod eviction?
I have tried to set a priority on the pods using a priority class, with a high priority and the preemption policy set to never, like below.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority-nonpreempting
value: 1000000
preemptionPolicy: Never
globalDefault: false
description: "This priority class will not cause other pods to be preempted."
Also, I have inspected the nodes and their node-pressure conditions. All the pressure conditions return as false, so I don’t believe this is caused by node-pressure eviction.
Below is an example of a pod that has this behaviour:
---
spec:
dnsPolicy: None
priorityClassName: high-priority-nonpreempting
containers:
- name: container
image: imageuri
resources:
limits:
cpu: 2
memory: 4G
restartPolicy: Never
dnsConfig:
nameservers:
- 8.8.8.8
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: zone
values:
- zone-1
operator: In
- weight: 90
preference:
matchExpressions:
- key: zone
values:
- zone-2
operator: In
metadata:
namespace: default
name: testingpod
kind: Pod
apiVersion: v1
If a pod is already scheduled and running on a node, IgnoredDuringExecution suggests that it shouldn't be removed because a node's labels changed at runtime; instead, only new pods should be required to meet the updated requirements.
If a pod uses more resources than the node truly has available, it will be evicted.