a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()
loss.backward()
print(a.grad)
print(b.grad)
After executing codes, the a.grad is None although a.requires_grad is True.
But if the code a = a.cuda() is removed, a.grad is available after the loss backward.
That happens because of your line
a = a.cuda()that override the original value ofa.You could use
Or
Or explicitly requesting to retain the gradients of
aErasing the gradients of intermediate variables can save significant amount of memory. So it is good that you retain gradients only where you need.