Suppose 1 thread is working on msg0 and 2nd thread is working on msg1. Now as of parallelism nature, msg1 got processed and committed its offset from 0->1. But while consuming msg0 due to any reason or service went down and it was not able to process so it is 0->0. After sometime when service comes up, it will check the last offset which is from msg1 and start processing from msg2. Due to this my msg0 get lost.
Now instead of committing one by one, I want batch wise maybe whole at once and keep small batches of group partition wise. Is it possible?
Tried-> I implemented circuit breaker which will stop consuming events for a while for safer side, but previous data loss will still be a problem.
Expecting-> I don't want to loose any data.
In Apache Kafka, you don't commit each offset individually to begin with. Kafka works differently than a traditional message-queue, because it's not a queue... (but a streaming platform build on an "log").
Committing offsets X mean, I have read everything up to X-1, and X is the next offset I want to consume. Hence, committing offsets is implicitly "batched" already.
For this reason, fanning out processing to multiple threads, is actually an issue, because if you get messages with offset O and O+1, and O is processed by thread-1, and O+1 is processed by thread-2, even if thread-2 is finished first it cannot just commit O+2, because it would mark O as successfully processed, too. -- You can only commit O+2 after both threads finished processing.