Is there a mathematical proof of the effectiveness of the target network trailing in Deep Q learning?

99 Views Asked by goodusernamf At 02 July 2023 at 10:23

It seems to be common practice in Deep Q-learning to have the target network trailing the main network, and syncing them every 100 or so steps, but I am not clear as to why that is.

The best explanations I have received as to why are ambiguous is:

it prevents the net to chase its own tail

Is there a mathematical proof that it yields better results than having the target and main network be the same at all times?

Original Q&A

There are 1 best solutions below

Ian Boyd On 06 July 2023 at 15:37

The use of a target network, like nearly everything in machine learning, was found emperically. It was created by the DeepMind team in their seminal 2013 paper:

Playing Atari with Deep Reinforcement Learning ^archive

And it just became what everyone does; a de-facto standard.

Is there a mathematical proof of the effectiveness of the target network trailing in Deep Q learning?

There are 1 best solutions below

Playing Atari with Deep Reinforcement Learning ^archive

Related Questions in NEURAL-NETWORK

Related Questions in REINFORCEMENT-LEARNING

Related Questions in COMPUTATION-THEORY

Related Questions in Q-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions

Is there a mathematical proof of the effectiveness of the target network trailing in Deep Q learning?

There are 1 best solutions below

Playing Atari with Deep Reinforcement Learning archive

Related Questions in NEURAL-NETWORK

Related Questions in REINFORCEMENT-LEARNING

Related Questions in COMPUTATION-THEORY

Related Questions in Q-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions

Playing Atari with Deep Reinforcement Learning ^archive