I'm using Spring Batch to accomplish the following:
- Reading a large data set from a database
- Making some transformations to each item
- Writing to a target database
I want to implement the reading-processing-writing in chunks because that way I don't need to put in memory all the instances of the items read. I don't know exactly how many items I will be processing, some days could be thousands and some other days could be millions, then I'm trying to prevent an eventual OOM error.
My problem with the above is that there is no way to determine what items I have completely processed and saved in the target database without implementing an intermediate file or table to control the evolution of the process. I don't want to implement the intermediate mechanism because I'm concerned about the performance. In case of failure in the middle of the job, I believe I don't have a way to restart the same job and insert the items that did not pass due to the failure.
Then, I concluded that I have to read the items and process them in chunks, writing them to the target database chunk by chunk without committing the transaction until the job is completed.
I would like to implement a custom JdbcBatchItemWriter. I've found this post which implements a class with a similar functionality to what I'm trying to do but it's a bit old, and I'm not sure if it follows best practices considering the current version of the framework.
Please kindly give some advice on how to implement my custom item writer the right way.