Suppose I have 5M documents which satisfy the URI module. But when I run the corb process, it only processed 2M records because of heap size issue. So, if I run the job again, will it pick the same 2M records again or from the remaining 3M records?
Note - I don't have any logic in the code to pick next set of data on every run.
How to setup in such a way, that on every run it should pick next set of records. I am running these jobs manually. Or corb will pick always the next set of data by default?
If your client doesn't have enough memory to hold all of the URIs for the queue, then you can enable the DISK-QUEUE option.
Enabling that option will allow for CoRB to spill to disk and use a file to hold the list of URIs to process, rather than holding them all in memory.
Without it, if you are filling up your memory and crashing with Out of Memory errors - then when you re-run, you will likely just wind up reprocessing the same initial set of URIs, unless you have any logic in your URIs module to change the sort order or to exclude already processed documents.