import torch
import torch.optim as optim
import torch.nn as nn
input = torch.tensor([1.,2.], requires_grad=True)
sigmoid = nn.Sigmoid()
interm = sigmoid(input)
optimizer = optim.SGD([input], lr=1, momentum=0.9)
for epoch in range(5):
optimizer.zero_grad()
loss = torch.linalg.vector_norm(interm - torch.tensor([2.,2.]))
print(epoch, loss, input, interm)
loss.backward(retain_graph=True)
optimizer.step()
print(interm.grad)
So I created this simplified example with an input going into a sigmoid as an intermediate activation function.
I am trying to find the input that results in interm = [2.,2.]
But the gradients are not passing through. Anyone know why?
Grads are computed for leaf tensors. In your example,
inputis a leaf tensor, whileintermis not.When you try to access
interm.grad, you should get the following error message:UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:486.)This is because grads are propagated back to the leaf tensor
input, not tointerm. You can addinterm.retain_grad()if you want to get the grad for theintermvariable.However, even if you did this, there is nothing in your example that would cause the value of
intermto change. Each optimizer step changes theinputvalue, but this does not result inintermbeing recomputed. If you wantintermto be updated, you need to recompute it each iteration with the newinputvalue. ie:There's also a fundamental problem with what you are trying to do. You say you want the
inputthat results ininterm = [2., 2.]. However, you are computinginterm = sigmoid(input). The sigmoid function is bounded between(0, 1). There is no such value ofinputthat would result ininterm = [2., 2.], because2is outside the range of the sigmoid function. If you ran your optimization loop indefinitely, you would getinput = [inf, inf]andinterm = [1., 1.].