We have the Kafka controller logs from a 3-node Kafka cluster version 0.10 , which got into a bad state after ZK expired:
[2024-01-09 18:33:42,897] INFO [SessionExpirationListener on 2], ZK expired;
shut down all controller components and try to re-elect
(kafka.controller.KafkaController$SessionExpirationListener)
[2024-01-17 05:06:13,469] INFO [SessionExpirationListener on 2], ZK expired;
shut down all controller components and try to re-elect
(kafka.controller.KafkaController$SessionExpirationListener)
[2024-01-17 16:33:43,349] INFO [SessionExpirationListener on 2], ZK expired;
shut down all controller components and try to re-elect
(kafka.controller.KafkaController$SessionExpirationListener)
[2024-01-17 16:33:44,059] INFO [Controller 2]: Broker 2 starting become
controller state transition (kafka.controller.KafkaController)
{noformat}
Not clear what can be the reasons about shut down all controller components and try to re-elect But seems its more related to Kafka issue and not zookeeper issue
In any case the code is as the following ( Register zk's SessionExpiration event notification: registerSessionExpirationListener When the session expires and a new session is established, the controller is re-elected; )
/**
* Called after the zookeeper session has expired and a new session has been created. You would have to re-create
* any ephemeral nodes here.
*
* @throwsException _
*On any error.
*/
@ throws (classOf[Exception])
def handleNewSession() {
info( "ZK expired; shut down all controller components and try to re-elect" )
inLock(controllerContext.controllerLock) {
onControllerResignation()
controllerElector.elect
}
}
}
The results of the described exception could be session from broker to zk expires and the broker will not actively initiate a reconnection operation
But the question is what can be the reasons when we get this exception from controller logs - shut down all controller components and try to re-elect