I don't understand the correct procedure for intorducing momentum in neural network training update:
suppose you have a dataset of 100 and a minibatch of 90 and train a NN for 2 epochs. I will express the iteration instances with "_n" and the momentum coefficient with "alpha".
My doubt is the following: is the old_delta_w saved across different epochs for the momentum update or we set it at 0 every time we pass from an epoch to another?
At the start the delta_w is NULL
1 epoch: you consider 90 instances;
weight_update = delta_w_0 + (alpha * NULL)
1 epoch: you consider 10 instances;
weight_update = delta_w_1 + (alpha * delta_w_0)
2 epochs: you consider 90 instances;
WHAT IS THE CORRECT UPDATE RULE NOW?
weight_update = delta_w_2 + (alpha * delta_w_1)
or the old_weight is not considered because it belongs to a previous epoch and so
weight_update = delta_w_2 + (alpha * NULL)
The notion of epoch is really old school and not really relevant anymore as it suggests that there is some extra structure to the data that really doesn't exist. To answer your question: momentum is just previous iteration update, doesn't matter if it crossed an epoch boundary or not, as the concept of epoch just doesn't really exist in (typical) optimisation.