I want to run an on-premise Kafka Connect (KC) cluster which will be running multiple different "source" and "sink" connectors. Currently in test configuration I scaled KC cluster to 3 instances that consume about 60G of heap memory each.
Are there any recommendations how to scale Kafka Connect instances effectively ? For example recommended number of CPUs / RAM per KC instance ? Maximum number of instances ?
There's no best practice, as it entirely depends on what connectors you're using.
For example, S3/HDFS sink buffer data in memory, so heap size is expected to be large. Also for sink connectors, you cannot scale their tasks beyond the partition count of their topics. For JDBC/Debezium sources, you cannot have more than one task per source table...
If you're simply asking how to manage scaling it, then configuration management like Ansible/Terraform can help, assuming you're not using Kubernetes.
Personally, I've managed a cluster of up to 30 nodes, each server with maybe 64G of RAM? I forget the EC2 type, but we needed to scale up several more nodes each holiday season to handle load, so it might be a bigger cluster now
I would not recommend starting with Xmx=60G since garbage collection might cause issues. You'll want to setup JVM monitoring and track actual heap usage