I have k8s one master and 3 worker nodes. I follow the official documentation to set up the RabbitMq cluster and operator for k8s. On the first try, all worked well. Later I needed some changes and cleaned up the namespace to redeploy everything. Now if I deeply my yaml file with 3 replicas, 2 work, and 3rd will restart forever. Also if I reduce to 2 replicas 1 works and one keep restarting. There is no error on a log but when I use describe command I found this warning.
Warning FailedScheduling 2m43s (x4 over 7m53s) default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 Insufficient memory. preemption: 0/4 nodes are available: 1 Preemption is not helpful for scheduling, 3 No preemption victims found for the incoming pod.
This is very strange behavior as all of the nodes have equal resources. I have checked all the nodes and they have 70% free resources of RAM and CPU. This is my yaml file.
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: my-rabbitmq
namespace: rabbitmq-system
spec:
image: rabbitmq:latest
replicas: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- rabbitmq
topologyKey: "kubernetes.io/hostname"
Another strange behavior I say is I am not able to delete any pod or PVC. When I try to delete they are stuck in a terminating state. So I have to delete them forcefully.
What I have done: I have checked all the nodes and resources I have more than enough resources available. I have checked the state of each node all are healthy I have checked the networking and all working goodies. I have checked the control plan node and schedule pod all show healthy with no errors. What Do I want: Help me to find out why rabbitMQ is not running the replica sets. Why it is not able to deploy the pods even though we have enough resources.?