It seems to be common practice in Deep Q-learning to have the target network trailing the main network, and syncing them every 100 or so steps, but I am not clear as to why that is.
The best explanations I have received as to why are ambiguous is:
it prevents the net to chase its own tail
Is there a mathematical proof that it yields better results than having the target and main network be the same at all times?
The use of a target network, like nearly everything in machine learning, was found emperically. It was created by the DeepMind team in their seminal 2013 paper:
And it just became what everyone does; a de-facto standard.