Why the grad is unavailable for the tensor in gpu

238 Views Asked by dddd At 05 September 2022 at 06:04

a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()
loss.backward()
print(a.grad)
print(b.grad)

After executing codes, the a.grad is None although a.requires_grad is True. But if the code a = a.cuda() is removed, a.grad is available after the loss backward.

Original Q&A

There are 1 best solutions below

Bob On 05 September 2022 at 07:20 BEST ANSWER

The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.

a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()

a.retain_grad() # added this line

loss.backward()
print(a.grad)

That happens because of your line a = a.cuda() that override the original value of a.

You could use

a = torch.nn.Parameter(torch.ones(5, 5))
a.cuda()

a = torch.nn.Parameter(torch.ones(5, 5, device='cuda'))

a = torch.nn.Parameter(torch.ones(5, 5).cuda())

Or explicitly requesting to retain the gradients of a

a.retain_grad() # added this line

Erasing the gradients of intermediate variables can save significant amount of memory. So it is good that you retain gradients only where you need.

Why the grad is unavailable for the tensor in gpu

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in AUTOGRAD

Related Questions in COMPUTATION-GRAPH

Trending Questions

Popular # Hahtags

Popular Questions