Why Pytorch autograd returns 'None' and throws error without variable exponentiated

60 Views Asked by At

I have some strange behavior I do not understand from Pytorch autograd, while trying to compute the second partial derivative with respect to x. The code

def f(x,y):
   return x**1 
x = torch.tensor([3.0,2.0], requires_grad=True)
y = torch.tensor([2.0,3.0], requires_grad=True)
z = f(x,y)
# differentiate z with respect to x twice
for _ in range(2):
    dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True, allow_unused=True)[0]
    z = dx
print(z)

as expected, gives

tensor([0., 0.])

However, without exponentiating x to 1, and multiplying by y as follows

def f(x,y):
    return x*y 
x = torch.tensor([3.0,2.0], requires_grad=True)
y = torch.tensor([2.0,3.0], requires_grad=True)
z = f(x,y)
# differentiate z with respect to x twice
for _ in range(2):
    dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True, allow_unused=True)[0]
    z = dx
print(z)

we get the output

None

Worst of all, just having the function return x as follows:

def f(x,y):
   return x 
x = torch.tensor([3.0,2.0], requires_grad=True)
y = torch.tensor([2.0,3.0], requires_grad=True)
z = f(x,y)
# differentiate z with respect to x twice
for _ in range(2):
    dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True, allow_unused=True)[0]
    z = dx
print(z)

Throws the following error:

RuntimeError                              Traceback (most recent call last)
Cell In[59], line 9
  7 # differentiate z with respect to x twice
  8 for _ in range(2):
----> 9     dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True, allow_unused=True)[0]
 10     z = dx
 11 print(z)

 File ~/miniconda3/envs/randomvenv/lib/python3.8/site-packages/torch/autograd/__init__.py:394, in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused, is_grads_batched, materialize_grads)
390     result = _vmap_internals._vmap(vjp, 0, 0, allow_none_pass_through=True)(
391         grad_outputs_
392     )
393 else:
--> 394     result = Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
395         t_outputs,
396         grad_outputs_,
397         retain_graph,
398         create_graph,
399         t_inputs,
400         allow_unused,
401         accumulate_grad=False,
402     )  # Calls into the C++ engine to run the backward pass
403 if materialize_grads:
404     result = tuple(
405         output
406         if output is not None
407         else torch.zeros_like(input, requires_grad=True)
408         for (output, input) in zip(result, t_inputs)
409     )

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Does anyone understand what is going on?

1

There are 1 best solutions below

7
Karl On

First, the example

def f(x,y):
   return x 

Autograd tracks operations performed. Your function only returns the input tensor x. Since no operations are performed, there is no gradient to compute. When you compute the first derivative, you get a tensor [1., 1.], which is the gradient of x with respect to itself. Since no computations have been performed, the gradient tensor is not part of any computation graph. As a result, the requires_grad attribute of the first derivative is zero. When you then try to compute the second derivative, your z value is the first derivative tensor which has no grad attribute, raising the error.

Second, the example of

def f(x,y):
    return x*y 

The first derivative of z wrt x is y. For computational simplicity, MulBackward populates the grad tensor with the data/computational graph of y. This means the first derivative tensor does not have x in it's computational graph. You can verify this by removing the allow_unused parameter.

def f(x,y):
    return x*y 
x = torch.tensor([3.0,2.0], requires_grad=True)
y = torch.tensor([2.0,3.0], requires_grad=True)
z = f(x,y)

dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True)[0]
dx2 = torch.autograd.grad(dx, x, grad_outputs=torch.ones_like(x))[0]

Computing dx2 will raise the error One of the differentiated Tensors appears to not have been used in the graph. There is no chain of operations linking dx to x. As a result, there is no derivative to be computed.

If you try compute the second derivative with a function that retains x in the first derivative compute chain, things will work.

def f(x,y):
    return x.pow(y )
x = torch.tensor([3.0,2.0], requires_grad=True)
y = torch.tensor([2.0,3.0], requires_grad=True)
z = f(x,y)

dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True)[0]
dx2 = torch.autograd.grad(dx, x, grad_outputs=torch.ones_like(x))[0]