Extracting the parameters and gradient norm used to fit data in PyTorch

48 Views Asked by At

ORIGINAL CODE

def get_theta(self):
    theta = self.parameters().detach().cpu
    return theta

def get_norm2Gradient(self):
    theta = get_theta(self)
    loss = loss(self, xb, yb)
    grad = loss.backward()
    for param in theta:
                grad.append(param.grad)
    #computes gradient norm
    norm2Gradient = torch.linalg.norm(grad)
    return norm2Gradient

def fit(self, loader, epochs = 2000):
    norm2Gradient = 1
    while norm2Gradient <10e-3 and epochs <2000:
        for _, batch in enumerate(loader):
            x, y = batch['x'], batch['y']
            #computes f.cross_entropy loss of (xb,yb) on GPU 
            loss = self.loss(x,y) 
            #print("loss:", loss)
            loss = loss.mean()
            #print("loss mean:", loss)
            #clears out old gradients  
            self.optimizer.zero_grad()
            #calculates new gradients
            grad = loss.backward()
            print("grad:",grad)
            #takes one step along new gradients to decrease the loss
            self.optimizer.step()  
            #captures new parameters
            theta = self.parameters()
            print("theta:",theta)
            #collects gradient along new parameters
            for param in theta:
                grad.append(param.grad)
            #computes gradient norm
            norm2Gradient = torch.linalg.norm(grad)
    return grad

CURRENT QUESTION and CODE (corrected per Karl's 3/2/2024 feedback)

I am trying to extract values that are computed during the fit function of PyTorch: the parameters themselves; and an L-2 norm of the gradient. Here is my code for these objectives.

def get_theta(self):
    theta = self.parameters().detach().cpu
    return theta

def fit(self, loader, epochs = 2000):
    norm2Gradient = 1
    while norm2Gradient >10e-3 and epochs <2000:
        for _, batch in enumerate(loader):
            x, y = batch['x'], batch['y']
            #computes f.cross_entropy loss of (xb,yb) on GPU 
            loss = self.loss(x,y) 
            #print("loss:", loss)
            loss = loss.mean()
            #print("loss mean:", loss)
            #clears out old gradients  
            self.optimizer.zero_grad()
            #calculates new gradients
            grad = loss.backward()
            print("grad:",grad)
            #takes one step along new gradients to decrease the loss
            self.optimizer.step()  
            #captures new parameters
            theta = self.parameters()
            print("theta:",theta)
            #collects gradient along new parameters
            for param in theta:
                grad.append(param.grad)
            #computes gradient norm
            norm2Gradient = torch.linalg.norm(grad)
            sumNorm2Gradient += norm2Gradient.detach().cpu
    return sumNorm2Gradient

Here is the reoccurring error message

AttributeError: 'NoneType' object has no attribute 'append'

It occurs at this line of the code.

grad.append(param.grad) 

I printed the grad variable out, and it says "None."

My intention was to capture the gradient with the following line of code.

grad = loss.backward()

What's the better way to do it that gets at the gradient being computed during the fit function?

Similarly: Does this line capture the parameters?

theta = self.parameters()

Thank you!

1

There are 1 best solutions below

1
Karl On

This is due to an error in your while condition

def fit(self, loader, epochs = None):
    #loss_mean = []
    norm2Gradient = 1
    while norm2Gradient <10e-3  and epochs <2000:
        ... 
    return grad

Since norm2Gradient = 1, the condition norm2Gradient <10e-3 evals to False and the while loop never executes. The function then tries to return grad when the grad variable has not been assigned. This triggers the error.

That said, there is another issue with your approach. Your gradient tensors will be of different shapes, so you can't string them together in a list and compute the L2 norm of them. You probably want to compute the L2 norm of each gradient tensor individually, then compute the average.