How does optimizer minimizing with loss function work in deep reinforcement learning?

49 Views Asked by Haloha At 22 February 2024 at 15:45

I've been studying a block of code with simple and straightforward DQN implementation but I'm having trouble understanding a core process of the implementation.

By inserting batches of 64, they compute the current Q values for each sample, the current target Q value and they compute the final td-error. How can we train the neural network with just one single value? From my understanding of back-propagation, we need the complete values of our current and target network, and we need the loss function differences for each output value. Meaning if we have 4 actions (output neurons) we should have a total of 4x64 values to train our model with. How is it possible to train the model with just one single value, how does it know what to change if we provide no indication on which outputs where far off from the target?

They also just take the mean of all 64 errors and train the model accordingly, which just further reduces the information provided to the network on how to change its parameters accordingly.

What am I missing?

Code for reference:

td_error = q_t_selected - tf.stop_gradient(q_t_selected_target) # Q(s,a;θi) -  ( r + gamma * maxQ(s',a';θi-) )
errors = U.huber_loss(td_error)
weighted_error = tf.reduce_mean(importance_weights_ph * errors)
optimizer.minimize(weighted_error, var_list=q_func_vars)

Original Q&A

How does optimizer minimizing with loss function work in deep reinforcement learning?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in DEEP-LEARNING

Related Questions in REINFORCEMENT-LEARNING

Related Questions in LOSS-FUNCTION

Trending Questions

Popular # Hahtags

Popular Questions