Pod Stuck in Terminating State Due to iptables 'Chain Already Exists' Error in Kubernetes

315 Views Asked by At

I'm facing an unusual issue with a Kubernetes deployment using the Mailu Helm chart, specifically the mailu-front component. After updating the deployment, the newly created pod works fine, but the old pod gets stuck in the 'terminating' state. The Kubernetes event log shows the following error related to pod termination:

error killing pod: failed to "KillPodSandbox" for "237aa644-7634-4fa2-a538-f973e7f7dfab" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"cbbcf6eaaf47c0ef5b92b97187276d04def7bcef3e68b92889b7993ba970ce55\": plugin type=\"portmap\" failed (delete): could not teardown ipv4 dnat: running [/usr/sbin/iptables -t nat -N CNI-DN-0e851981d24bd2d807e1a --wait]: exit status 1: iptables: Chain already exists.\n" 

I'm running a small kubeadm 1.28 cluster with Weave CNI and Debian 11 host OS for all nodes

Attempts to force delete the pod have been unsuccessful. This issue seems to be related to an iptables configuration, specifically a conflict where a chain cannot be torn down because it already exists (conflicting, since the command in the event log intends to create the chain, not delete it).

I've not found much information on how to resolve this particular error.

If I log into the host OS of the affected node and run `sudo iptables -t nat -N CNI-DN-0e851981d24bd2d807e1a --wait` , I can indeed see the issue reflected in the output:

iptables: Chain already exists.

If I try to delete the chain with sudo iptables -t nat -X CNI-DN-0e851981d24bd2d807e1a --wait, iptables says the chain is already in use:

iptables v1.8.7 (nf_tables):  CHAIN_USER_DEL failed (Device or resource busy): chain CNI-DN-0e851981d24bd2d807e1a

I tried restarting the node but the problem still occurs.

1

There are 1 best solutions below

1
Kiran Kotturi On

Looks like the previous termination cleanup is not completed. If you attempt to Force delete that might cause some unexpected results like resource leak due to an incomplete cleanup process.

Try below diagnosis steps for further cleanup processes of the related resources on the kubernetes node while the Pod is still stuck in the Terminating state.

  • SSH into the target kubernetes node

  • Execute crictl ps to confirm the CONTAINER_ID

  • Execute crictl pods to confirm the POD_ID

  • Execute crictl stop CONTAINER_ID to stop the container

  • Related resources should be automatically deleted by kubelet, without user intervention.

  • Wait for a few minutes, double check if the container and pod is still existing. If yes, execute crictl rm <CONTAINER_ID> and crictl rmp <POD_ID> to remove accordingly.

You can refer to the official documentation for more information.

UPDATE:

As per this link, You can try iptables -F <CHAINNAMEHERE> followed by iptables -X <CHAINNAMEHERE>

It appears -F deletes all the rules in a chain but not the chain itself. So it is necessary to follow up with a -X.