What should be ideal active process count when using Java 21 Virtual Threads in GKE instance?

81 Views Asked by anuragagarwal561994 At 21 January 2024 at 12:57

I am trying to benchmark my production application with Java 21 virtual threads and without.

Not that this is not a JMH kind of benchmark, I am redirecting 50% traffic on 2 set of kubernetes deployment, one where I have code deployed serving with Tomcat Virtual Threads implementation and other where I am normally using Tomcat NIO.

I have kept same HPA for both the set of fleets and I am trying to monitor the following:

Number of Pods in each fleet
Throguhput
CPU utilisation
p99, p95, p75, p50 latency metrics
Memory Ulitisation
Load Average

Node Details

I am using Google Cloud n2d instances (AMD Processor 2nd Generation) 16 core machine (8 physical cores) 16G Memory

Application Details

My application is both CPU & IO intensive, since it receives a lot of requests, it calls various services to enrich the data, split & send the request to another service and then joins back the responses finally to deliver the response. As I have seen from profiling most of the CPU time goes in serilisation & deserialisation.

Since the application is IO itensive, using virtual threads was a good choice here for us.

The implementation not having virtual threads is optimised as well, when initiating multiple request parallely it forks them asyncornously and then waits for them in the current thread. But we are using 300-500 worker threads in Tomcat. For async calls we are using apache http async client and grpc where we are using threads = number of processors.

Average Throughput: 2500 qps p99 latency: 195ms p95 latency: 115ms p75 latency: 50ms p50 latency: 30ms Average Active Request: 84

Pod Scheduling

Each pod is scheduled on a 16 core node, which means 1 node contains only 1 pod. There is no limit set for on the pods. These pods thus can consume the entire resources of the node.

Observations

Before we conducted this experiment we assumed that we would be increasing our throughput since now we won't be wasting time context switching between platform threads (300-500).

We did see the following:

50-100qps improvement in throughput
Memory reduction (expected as the number of threads reduced)
Latency was similar, slight reduction but nothing that can be accounted
Load average was down from 17 to 14

The thing that I was not able to understand here was that since now I am only doing CPU processing inside my threads why is my load average not aligning with my CPU utlisation which was close to 50% or (8000 - 9000m)

On investigating further and reading more about load averages, I tried by supplying

-XX:ActiveProcessorCount=8

I did this so that it matches my number of physical cores and surprisingly everything changed:

Throughput improvement remained same 50-100 qps, but this time became clear difference while earlier it was going up / down
Memory reduction as further more threads reduced, earlier where common pool, fork join, httpclient, grpc etc were taking 16 threads now everyone is taking 8.
Latency improvements became more clear and 2-3% improvements were seen in all the metrics
CPU utilisation reduced by 1 core
Load Average was down to 8 and closer to CPU Utilisation

I wanted to understand why this happened and how things are playing underneath the system. Usually in all my applications which are either async or using virtual threads, I am using Thread Counts = Number of Processors, but here when I instead make them Thread Count = Number of Physical Processors I am seeing much improved results and load average.

Original Q&A

What should be ideal active process count when using Java 21 Virtual Threads in GKE instance?

Node Details

Application Details

Pod Scheduling

Observations

There are 0 best solutions below

Related Questions in MULTITHREADING

Related Questions in KUBERNETES

Related Questions in JAVA-21

Related Questions in PROCESSORS

Related Questions in VIRTUAL-THREADS

Trending Questions

Popular # Hahtags

Popular Questions