We have a service that listens on an mqtt channel for measurements. After some conversions these measurements get put on kafka. This all works fairly well, but there is a memory leak, somewhere, which I can't pinpoint.
The memory profile of my application can be seen here:
The yellow line corresponds to the direct byte buffer pool.
After investigating the heap dump with MAT and applying this OQL query: SELECT x AS ByteBuffer, x.capacity AS Capacity, x.limit AS Limit, x.mark AS Mark, x.position AS Position FROM java.nio.DirectByteBuffer x WHERE ((x.capacity > (1024 * 1024)) and (x.cleaner != null)) we see that there are roughly 400 direct bytebuffers, each with a capacity of roughly 4MB.
If you follow this to the GC root you see that it is used by a kafka producer:

This led me to applying this OQL query: select * from org.apache.kafka.common.utils.KafkaThread which returns the following response:

Which seems like a logical response, however if you dive a bit deeper into the contents of the kafka producer you see that it is keeping a lot of direct byte buffers:

The producer gets made here:
@Component
@RequiredArgsConstructor
public class BxKafkaProducerFactory {
private final KafkaProperties kafkaProperties;
public <K, V> Producer<K, V> getProducer() {
return new KafkaProducer<>(getKafkaProducerProperties());
}
}
We use default settings for almost all things. Whe use acks=all, the lingering time is 100ms, security protocol SASL_PLAINTEXT, PLAIN for the saslMechanism,KafkaAvroSerializer for the key and value serializer, compression type of snappy. The same settings we use in different (relatable) services where we don't see this memory issue.
We just have 3 producers that get created on startup, and removed on shutdown, so it is not that we are leaking producers (I checked this by adding a log statement on the producer creation). It looks more like the producers themselves are leaking memory.
Does anybody have any idea what could cause this? Or have any good pointers on how to debug this next? As you can see in the first image the memory is steadily increasing and it doesn't come down even after a week or so.
EDIT:
After some more debugging and following all the byte buffers to their gc root, we see this progress:
15:23
Total 18:
GlobalStream: 1
HearthBeat: 1
Java Thread: 1
Nio FastThread: 5
Producer 1: 1
Producer 2: 2
Producer 3: 2
Cleaner: 5
16:04
Total 23:
GlobalStream: HearthBeat 1
Java Thread: 1
Nio FashThread: 6
Producer 1: 0
Producer 2: 8
Prodcuer 3: 2
Cleaner: 4
17:04
Total 25:
GlobalStream: HearthBeat 2
Java Thread: 1
Nio FashThread: 7
Producer 1: 2
Producer 2: 8
Prodcuer 3: 4
Cleaner: 0
We clearly see the number of bytebuffers of the kafka producer increasing over time. This also matches to the jvm metrics we catch:

Thanks in advance!
PS: I have also investigated it with jeprof and jemalloc, but couldn't find anomalies there.

