why I am not able to redeploy RabbitMQ cluster on K8s?

110 Views Asked by At

I have k8s one master and 3 worker nodes. I follow the official documentation to set up the RabbitMq cluster and operator for k8s. On the first try, all worked well. Later I needed some changes and cleaned up the namespace to redeploy everything. Now if I deeply my yaml file with 3 replicas, 2 work, and 3rd will restart forever. Also if I reduce to 2 replicas 1 works and one keep restarting. There is no error on a log but when I use describe command I found this warning.

  Warning  FailedScheduling  2m43s (x4 over 7m53s)  default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 Insufficient memory. preemption: 0/4 nodes are available: 1 Preemption is not helpful for scheduling, 3 No preemption victims found for the incoming pod.

This is very strange behavior as all of the nodes have equal resources. I have checked all the nodes and they have 70% free resources of RAM and CPU. This is my yaml file.

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: my-rabbitmq
  namespace: rabbitmq-system
spec:
  image: rabbitmq:latest
  replicas: 3
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - rabbitmq
          topologyKey: "kubernetes.io/hostname"

Another strange behavior I say is I am not able to delete any pod or PVC. When I try to delete they are stuck in a terminating state. So I have to delete them forcefully.

What I have done: I have checked all the nodes and resources I have more than enough resources available. I have checked the state of each node all are healthy I have checked the networking and all working goodies. I have checked the control plan node and schedule pod all show healthy with no errors. What Do I want: Help me to find out why rabbitMQ is not running the replica sets. Why it is not able to deploy the pods even though we have enough resources.?

0

There are 0 best solutions below