Kubernetes GCP and Java application using Hikari and losing a connection out of the pool daily

38 Views Asked by At

I'm running a service that's using Apache Cayenne and Java using kubernetes and gcp.

The service is made up of two java apps. Each one maintains connections to the database. The first app provides a set of rest endpoints. The second job runs jobs over the course of the day that examine the database and generate files, which are used for batch processing. The applications have been in production since 2018 and have run flawlessly.

The API application handles a small number of REST requests at most a couple of hundred a day, which trigger sql selects. The job processing app does daily batch processing. it creates around 10 to 30 thousand records and writes those records out into a file as a request file. The remote system takes those entries acts on them and creates an equal sized file that our app reads in and uses to update the records for the rows of the request file. This batch file generation varies in size from a few thousand to over 100,000 records daily.

Then this last December a problem started. The first lost connection happened around the 8th of Dec. It ran like that till the 23d of December then started losing 1 connection from the poo a day till the 31st when there were no connections and the app had to be restarted. There was no release at the time when things started going wrong. The last release of code to production happened early in November and was only a db-deploy script to do some end of year sql.

Starting Jan 1 of this year, the hikari connection pool started losing connections at the rate of about 1 a day. The first instance of this problem started on Dec 8th with the loss of one entry int he pool. That means every 10 days, we have to restart the application. Restarted on the 1st, the 10th, and the 20th.

We're using the default max connection lifetime. So it's not clear to me why hikari isn't seeing the unavailable connections and restarting them, but that's clearly not happening. Max pool size is set to 10, minimumIdle is set to 10.

My short term bandaid fix would be to restart the application daily.

Question 1: Is there a configuration value that we could modify to ensure that when the entry in the pool is unavailable it get killed and restarted?

Question 2: Is there a simple way to restart the nodes in my application on a daily basis and thus force the application daily to reset the pool

0

There are 0 best solutions below