I have some strange behavior I do not understand from Pytorch autograd, while trying to compute the second partial derivative with respect to x. The code
def f(x,y):
return x**1
x = torch.tensor([3.0,2.0], requires_grad=True)
y = torch.tensor([2.0,3.0], requires_grad=True)
z = f(x,y)
# differentiate z with respect to x twice
for _ in range(2):
dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True, allow_unused=True)[0]
z = dx
print(z)
as expected, gives
tensor([0., 0.])
However, without exponentiating x to 1, and multiplying by y as follows
def f(x,y):
return x*y
x = torch.tensor([3.0,2.0], requires_grad=True)
y = torch.tensor([2.0,3.0], requires_grad=True)
z = f(x,y)
# differentiate z with respect to x twice
for _ in range(2):
dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True, allow_unused=True)[0]
z = dx
print(z)
we get the output
None
Worst of all, just having the function return x as follows:
def f(x,y):
return x
x = torch.tensor([3.0,2.0], requires_grad=True)
y = torch.tensor([2.0,3.0], requires_grad=True)
z = f(x,y)
# differentiate z with respect to x twice
for _ in range(2):
dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True, allow_unused=True)[0]
z = dx
print(z)
Throws the following error:
RuntimeError Traceback (most recent call last)
Cell In[59], line 9
7 # differentiate z with respect to x twice
8 for _ in range(2):
----> 9 dx = torch.autograd.grad(z, x, grad_outputs=torch.ones_like(x), create_graph=True, allow_unused=True)[0]
10 z = dx
11 print(z)
File ~/miniconda3/envs/randomvenv/lib/python3.8/site-packages/torch/autograd/__init__.py:394, in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused, is_grads_batched, materialize_grads)
390 result = _vmap_internals._vmap(vjp, 0, 0, allow_none_pass_through=True)(
391 grad_outputs_
392 )
393 else:
--> 394 result = Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
395 t_outputs,
396 grad_outputs_,
397 retain_graph,
398 create_graph,
399 t_inputs,
400 allow_unused,
401 accumulate_grad=False,
402 ) # Calls into the C++ engine to run the backward pass
403 if materialize_grads:
404 result = tuple(
405 output
406 if output is not None
407 else torch.zeros_like(input, requires_grad=True)
408 for (output, input) in zip(result, t_inputs)
409 )
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Does anyone understand what is going on?
First, the example
Autograd tracks operations performed. Your function only returns the input tensor
x. Since no operations are performed, there is no gradient to compute. When you compute the first derivative, you get a tensor[1., 1.], which is the gradient ofxwith respect to itself. Since no computations have been performed, the gradient tensor is not part of any computation graph. As a result, therequires_gradattribute of the first derivative is zero. When you then try to compute the second derivative, yourzvalue is the first derivative tensor which has no grad attribute, raising the error.Second, the example of
The first derivative of
zwrtxisy. For computational simplicity,MulBackwardpopulates the grad tensor with the data/computational graph ofy. This means the first derivative tensor does not havexin it's computational graph. You can verify this by removing theallow_unusedparameter.Computing
dx2will raise the errorOne of the differentiated Tensors appears to not have been used in the graph. There is no chain of operations linkingdxtox. As a result, there is no derivative to be computed.If you try compute the second derivative with a function that retains
xin the first derivative compute chain, things will work.