I had upgraded my cluster OS from Ubuntu-18.04LTS to Ubuntu-20.04LTS, Post upgrade our bare metal kubernetes cluster nodes are not running it remains in NotReady state.
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
server01 Ready control-plane,master 364d v1.23.16 175.211.213.121 <none> Ubuntu 20.04.6 LTS 5.4.0-132-generic docker://20.10.7
server02 NotReady worker1 299d v1.23.16 175.211.213.122 <none> Ubuntu 18.04.6 LTS 5.4.0-132-generic docker://20.10.7
server03 NotReady worker2 364d v1.23.16 175.211.213.123 <none> Ubuntu 18.04.6 LTS 5.4.0-132-generic docker://23.0.6
server04 NotReady worker3 364d v1.23.16 175.211.213.124 <none> Ubuntu 18.04.6 LTS 5.4.0-132-generic docker://20.10.7
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-bd6b6df9f-4gc8m 1/1 Running 14 (55m ago) 364d 10.244.0.30 server01 <none> <none>
coredns-bd6b6df9f-9kh94 1/1 Running 13 (55m ago) 364d 10.244.0.29 server01 <none> <none>
etcd-server01 1/1 Running 236 (55m ago) 364d 175.211.213.121 server01 <none> <none>
kube-apiserver-server01 1/1 Running 234 (55m ago) 364d 175.211.213.121 server01 <none> <none>
kube-controller-manager-server01 1/1 Running 15 (55m ago) 364d 175.211.213.121 server01 <none> <none>
kube-proxy-6rs8g 1/1 Running 1 (128d ago) 299d 175.211.213.122 server02 <none> <none>
kube-proxy-lfks4 1/1 Running 13 (55m ago) 364d 175.211.213.121 server01 <none> <none>
kube-proxy-r6nkn 1/1 Running 4 (128d ago) 364d 175.211.213.124 server04 <none> <none>
kube-proxy-xlzzf 1/1 Running 6 (128d ago) 364d 175.211.213.123 server03 <none> <none>
kube-scheduler-server01 1/1 Running 15 (55m ago) 364d 175.211.213.121 server01 <none> <none>
$ kubectl describe node server01
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Sun, 24 Sep 2023 07:57:38 +0530 Sun, 24 Sep 2023 07:57:38 +0530 FlannelIsUp Flannel is running on this node
MemoryPressure Unknown Tue, 12 Dec 2023 11:04:12 +0530 Tue, 12 Dec 2023 11:09:22 +0530 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Tue, 12 Dec 2023 11:04:12 +0530 Tue, 12 Dec 2023 11:09:22 +0530 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Tue, 12 Dec 2023 11:04:12 +0530 Tue, 12 Dec 2023 11:09:22 +0530 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Tue, 12 Dec 2023 11:04:12 +0530 Tue, 12 Dec 2023 11:09:22 +0530 NodeStatusUnknown Kubelet stopped posting node status.
$ journalctl -u kubelet
Jan 30 14:51:44 server01 kubelet[2172]: E0130 14:51:44.950895 2172 cni.go:362] "Error adding pod to network" err="loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory" pod="kube-system/coredns-bd6b6df9f-4gc8m" podSandboxID={Type:docker ID:f6d0be4aac0f8ad199f806098e13ed900586ff8eae44a6415824a1da926c3cfa} podNetnsPath="/proc/6247/ns/net" networkType="flannel" networkName="cbr0"
Jan 30 14:51:44 server01 kubelet[2172]: E0130 14:51:44.953457 2172 cni.go:362] "Error adding pod to network" err="loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory" pod="kube-system/coredns-bd6b6df9f-9kh94" podSandboxID={Type:docker ID:49cf7e42be9c6d4f58abeeb628e72b8854716da96dd70c038f13302e54dc022c} podNetnsPath="/proc/6242/ns/net" networkType="flannel" networkName="cbr0"
Jan 30 14:51:45 server01 kubelet[2172]: E0130 14:51:45.064193 2172 remote_runtime.go:209] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to set up sandbox container \"f6d0be4aac0f8ad199f806098e13ed900586ff8eae44a6415824a1da926c3cfa\" network for pod \"coredns-bd6b6df9f-4gc8m\": networkPlugin cni failed to set up pod \"coredns-bd6b6df9f-4gc8m_kube-system\" network: loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory"
kubelet service on all worker servers it shows below.
# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Tue 2024-01-30 15:36:36 IST; 3s ago
Docs: https://kubernetes.io/docs/home/
Process: 30089 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 30089 (code=exited, status=1/FAILURE)
I have rebooted the servers and executed swapoff -a still couldn't solve the issue. So do let me know further how do i debug the issue to resolve this issue?