Vert.x web and micro services - Health check being starved

80 Views Asked by At

We use a very custom framework built on Vert.x to build our k8s micro services. This framework does a lot of the heavy lifting for teams, such as setting up all the endpoints and creating the health check endpoint, among other things.

One issue we see is that some of the micro services will start to "starve out" the health check when they get overloaded. So, as an app takes on heavy traffic, the health check, which runs every 20s, will time out. For these micro services I have checked that the teams are properly setting the "blocking" on endpoints that do any sort of blocking calls, like DB reads/writes or downstream API call.

The health check endpoint, being comprised of quick checks, does not. My understanding is that blocking handlers get pushed off to the work queue and non-blocking stay in the event loop, so my theory is that under heavy strain the event loop is filling up, and by the time it gets to the queued health check it's already past the timeout. I say this because we see the timeout of the Kubernetes-side, but out total-processing-time on the health check, which starts once the handler is called, is quick.

I attempted to alleviate this by pushing the health check into it's own Verticle, not quite understanding that you can't have multiple vertices on the same port (that was a misunderstanding on my part in reading the documentation).

So, my question is: What is the correct way to prioritize the health check? Is there a way to push these health checks to the front of the queue, or should we be looking more to some sort of "tuning"?

0

There are 0 best solutions below