We are running a akka cluster in docker and running in Mesos. The structure is such that 3 different applications (each having 4 instances) talk to each other within the cluster
When we want to do a deployment, we are using Marathon upgrade strategy feature to deploy. The way it is configured that it will create a new node with latest deplyment and then kill one of the old nodes and continue this process till all nodes are up. We are using below cofiguration to achieve the same (for 4 nodes)
"upgradeStrategy": {
"minimumHealthCapacity": 1,
"maximumOverCapacity": 0.3
},
Our main goal is to have minumum failure during deployment. However it takes some time for nodes in other application to know about this killed node and the some traffic is getting directed to that which eventually fails. We tuned cluster failure detector to reduce this time, but still we see a good % failure during deployment window
What can be done to handle this. Is there a way to trap signal from Mesos and remove the node gracefully from the cluster
What I would probably do is use Akka Management with
akka.management.http.route-providers-read-onlyset tofalse. This exposes the Akka Cluster Management HTTP endpoints which allow you to change cluster state via HTTP calls.The HTTP endpoint of interest is
DELETE /cluster/members/{address}where address is a Cluster URI likeakka://[email protected]:port. Depending on the particulars of your Marathon deployment, the IP address and port are available as environment variables to the docker entrypoint. Thus, you can modify your application launch script to, after the application exits:There will still be a window while the application shuts down where the other nodes believe that it's still up, but this will likely be faster than waiting for the failure detector to judge a node down (and if you have an application (e.g. one which makes heavy use of persistent actors) where you'd like to minimize false-positive failures, you can loosen the failure detection thresholds while having a quick failure detection window).