I am new to gradient descent and I'm completely lost on the exercise below. The first part is an explanation with a simple example. Here is that example:
When training the model, we want to find parameters (denoted as Θ ) that minimize the total loss across all training examples:
Θ=argminΘ (Θ). To do this, we will iteratively reduce the error by updating the parameters in the direction that incrementally lowers the loss function. This algorithm is called gradient descent. The most naive application of gradient descent consists of taking the derivative of the loss function. Let us see how to do this.
As a toy example, say that we are interested in differentiating the function =2⊤ with respect to the column vector . To start, let us create the variable x and assign it an initial value.
Here is the code:
x = torch.arange(4.0)
x.requires_grad_(True)
x.grad
y = 2 * torch.dot(x, x)
y.backward()
x.grad
#checking if gradient calculated correctly
x.grad == 4 * x
And now, based on the above I have to solve this:
Let ()=sin(). Plot () and and (), where the latter is computed without exploiting that ′()=cos().
x = np.linspace(-np.pi, np.pi, 100) x = torch.tensor(x, requires_grad=True) y = torch.sin(x)
...and now what?
I tried:
y.backward() x.grad
but I'm getting an error that y is not a scalar value.
I need to pass these assertions:
assert torch.allclose(x.grad[10].float(), torch.Tensor([-0.8053]), rtol=1e-2) assert torch.allclose(x.grad[50].float(), torch.Tensor([0.9995]), rtol=1e-2)
By default, pytorch expects you to call
backwardon a scalar value. This is why if you cally.backward(), you get the errorgrad can be implicitly created only for scalar outputs.One solution is to aggregate
ywith asumoperation, which doesn't affect downstream gradients.You can also call
backwardon a vector value if you provide an upstream gradient vector.Both methods pass the assertions you provided.