Say there are 32 partitions in my Azure eventhub. I have a consumer trying to read 32 partitions and do checkpointing. But sometimes, the incoming messages to the eventhub might scale up. In order to manage the scale up, I was thinking of scaling the consumers. So it would be like 2 consumers from a consumer group reading 32 partitions each.
In the azure eventhub documentation, I read that it is always advisable to make only one active receiver on a partition within a consumer group otherwise there is a possibility of reading duplicate events.
My questions are,
Even with checkpointing, is it possible for two different consumers to read same event from the same partition?
If so, what is the best scenario to solve this when the incoming messages scales up? Should I create a consumer to read only partitions 1-16 and another to read from 17-32 to handle the load?
What is your consumer? Are you using a VM application or Azure functions or something similar?
The checkpointing is at a partition level per consumer group. So only if you are using different consumer groups and you attempt to read the same partition, then you will definitely get duplicate events which you will need to handle. If you have 2 applications using the same consumer group and same partition you will not get duplicate events but you may get contention issues where both applications attempt to read the same partition but the partition has been locked by 1 application. This is handled automatically in Azure functions.
How sure are you that 2 consumers will be enough for future load? If you are using Azure functions and setup autoscaling, the contention issues are managed automatically and the servers will scale up either to your max setup or to the number of partitions depending on the load. If you are sure that you will need only 2 VMs/applications then yes, divide and conquer. Split the number of partitions between the 2. This is with the assumption that the event sender is sending the events without a partition key or a partition number. In this case, the events will be load balanced between all partitions.