Behavior of Tensorflow's GradientTape when target is not a scalar

91 Views Asked by Joe C. At 30 April 2023 at 18:21

Can somebody explain to me the shape and value of Tensorflow's GradientTape output when target is not a scalar value? For example, I had the following code:

import tensorflow as tf

a = tf.Variable([[-1.], [0.], [1.]])
b = tf.Variable([[1.,2.,3.],[4.,5.,6.]])
with tf.GradientTape() as g:
    c = b @ a
grads = g.gradient(c, a)
print(c)
print(grads)

The value of c is [[2.],[2.]]. The value of grads is [[5.],[7.],[9.]].

I expected the value of grads to have shape (3,2) or (2,3), and contain values of partial derivatives of each entry of c with respect to a. I am not sure what the values of 5, 7, and 9 represent (interestingly, it seems to be the gradients as if c had been tf.reduce_sum(b @ a) instead)

The documentation that I found doesn't really explain the output.

Original Q&A

There are 1 best solutions below

Sean On 30 April 2023 at 21:06

Because your output is non-scalar you compute the Jacobian matrix gradients.

These gradients are accumulated (ie. summed) across dimensions so that the gradients have the same shape as the values array of the tensor. So that when we apply them, we can easily subtract them from our values.

If you don't want this to happen, or you just want to see how the gradients are accumulated, you can do the following (in tf 2.7 and up):

with tf.GradientTape() as g:
    c = b @ a

grads = g.jacobian(c, a) # shape = (2, 1, 3, 1)

Here you end up with a gradient tensor like: [[1, 2, 3], [4, 5, 6]] (I dropped the singleton dimensions) which is more like what you expected. What gradient tape then does by default is sum across these dimensions which gives us [5, 7, 9].

Behavior of Tensorflow's GradientTape when target is not a scalar

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in GRADIENTTAPE

Trending Questions

Popular # Hahtags

Popular Questions