I have a distributed springboot application where I have a scheduler which will start a long running task.
What I would like to achieve is whenever the instance - which started the job - dies or not responding then an other instance will take over the task and continue the work.
Is there a lib which makes it easy? Or should I do everything by hand like Split the task into several smaller parts, checkpointing,serializing the progress constantly to some distributed store..etc
Edit: Spark came into my mind but could be an overhead