How to scale Kafka Connect effectively?

238 Views Asked by vrogach At 21 April 2023 at 06:52

I want to run an on-premise Kafka Connect (KC) cluster which will be running multiple different "source" and "sink" connectors. Currently in test configuration I scaled KC cluster to 3 instances that consume about 60G of heap memory each.

Are there any recommendations how to scale Kafka Connect instances effectively ? For example recommended number of CPUs / RAM per KC instance ? Maximum number of instances ?

Original Q&A

There are 1 best solutions below

OneCricketeer On 21 April 2023 at 14:00

There's no best practice, as it entirely depends on what connectors you're using.

For example, S3/HDFS sink buffer data in memory, so heap size is expected to be large. Also for sink connectors, you cannot scale their tasks beyond the partition count of their topics. For JDBC/Debezium sources, you cannot have more than one task per source table...

If you're simply asking how to manage scaling it, then configuration management like Ansible/Terraform can help, assuming you're not using Kubernetes.

Personally, I've managed a cluster of up to 30 nodes, each server with maybe 64G of RAM? I forget the EC2 type, but we needed to scale up several more nodes each holiday season to handle load, so it might be a bigger cluster now

I would not recommend starting with Xmx=60G since garbage collection might cause issues. You'll want to setup JVM monitoring and track actual heap usage

How to scale Kafka Connect effectively?

There are 1 best solutions below

Related Questions in APACHE-KAFKA

Related Questions in APACHE-KAFKA-CONNECT

Related Questions in DISTRIBUTED-SYSTEM

Related Questions in HORIZONTAL-SCALING

Trending Questions

Popular # Hahtags

Popular Questions