Coding logistic SGD manually in a for loop in Python -- am I interpreting this correctly?

38 Views Asked by At

I'm following the steps of SGD, but unsure if I am interpreting the steps correctly.

Let's say there are two w terms:

equation1

where x is tensors and w are two logistic function params.

equation2

and the goal is to find argmin(lambda)

And SGD formula is given as:

equation3

This is the code I have so far, with library constraints:

import numpy as np
import torch

some_data = torch.rand(1000, 2)
x = some_data[:,0]
y = (x + 0.4 * some_data[:,1] > 0.5).to(torch.int) # doesn't work with bool?

x = torch.split(x, 3) # batch
y = torch.split(y, 3)

# optim problem
w1 = torch.autograd.Variable(torch.tensor([0.1]), requires_grad=True)
w2 = torch.autograd.Variable(torch.tensor([0.1]), requires_grad=True)

lambda_min = torch.zeros(1)

for epoch in range(10):
  #print(f'epoch: ', epoch)
  for xx, yy in zip(x, y):
    
    p_x = 1 / (1 + torch.exp(-w1 - (torch.mul(w2, x))))
    
    # NLL
    lambda = torch.sum(yy * torch.log(p_x) + (1 + yy) * torch.log(1 - p_x))
    
    # Gradient???
    if lambda < lambda_min :
      lambda_min = lambda # Q: loss value, right?
      # Q: update params...?
      w2 = w1 - torch.mul(0.01, lambda_min)
      print(f'w2 = ', w2)
    else:
      pass

I would like to confirm the parts with "Q:" Then how would I plot this with y and x to look like a regular logistic function with tensors, similar to below, but with all the 1s and 0s in the graph?

1

There are 1 best solutions below

0
ForceBru On

There are multiple issues with this code.

  1. You never calculate any gradients, so this is not gradient descent.
  2. Gradient descent updates values of parameters always, not just when lambda < lambda_min. BTW, lambda is a keyword and can't be used as a variable name.
  3. w2 = w1 - torch.mul(0.01, lambda_min) is not a gradient descent step:
    1. Gradient descent sets new w2 equal to old w2 plus some adjustment. You code sets it equal to w1 (!) plus some adjustment.
    2. That adjustment is supposed to be the scaled negative gradient. In your code it's the scaled value of the loss function, which it not the gradient.
  4. According to your logistic function formula, (1 + yy) * torch.log(1 - p_x) should be (1 - yy) * torch.log(1 - p_x) (note the "1 minus yy" part).