I have an Azkaban executor server process, which is a Java service.
I noticed that when running a random sleep script, the CPU usage becomes very high, consistently exceeding 2000%, and the "top" command shows high sys usage.
I captured a jstack file hoping to analyze the cause, but I found that many of the stack traces were showing normal calls.
For example,
there are over 60 instances stuck at "at azkaban.execapp.JobRunner.run(JobRunner.java:652)",

where it hangs at "Thread.currentThread().setName",
and 96 instances stuck at "at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1019)".
I feel that these are supposed to be quick operations and should not be causing a bottleneck.
The same program, when run on a KVM machine (create by myself) with 10 cores and 86GB of memory, uses around 200% CPU and handles around 700 concurrent tasks.
However, when run on an Alibaba Cloud instance with 32 cores and 128GB of memory, the CPU usage goes over 2000% and seems to handle only about 400 concurrent tasks.
This makes me suspect there might be a performance issue with the cloud instance. How should I go about troubleshooting this problem?
this is my jstack file by Alibaba Cloud server
https://drive.google.com/file/d/1FXPfndCuhVHFKjQUKZYomvaRoZQ5Q5aP/view?usp=drive_link
and alibaba cloud server ulimit -a output is as follows:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 506862
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 131072
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
add random sleep script:
rnumber=$((RANDOM%240+60));echo "benchmarks shell: sleep $rnumber";sleep $rnumber;
same thread stack:



