Understanding the calculation of Perceptron weights in Python

132 Views Asked by At

I am trying to undersand how the Perceptron's weights are calculated, for example, with this method fit:

def fit(self, X,y):
       
        self.w_ = np.zeros(1 + X.shape[1])
        self.errors_ = []
        for _ in range(self.n_iter):
            errors = 0
            for xi, target in zip(X, y):
                update = self.eta * (target - self.predict(xi))
                self.w_[1:] += update * xi
                self.w_[0] += update
                errors += int(update !=0.0)
            self.errors_.append(errors)
        return self

Let's imagine that on the first iteration of the for loop we have:

xi = array([5.1, 1.4])
target = int(-1)
self.eta = float(0.01)
self.w_=array([0., 0., 0.])

Then self.predict(xi) occurs to get update:

def predict(self,X):
        return np.where(self.net_input(X) >= 0.0, 1, -1)

And it calls self.net_input(X):

def net_input(self,X):
        return np.dot(X, self.w_[1:]) + self.w_[0]

Then we have these calculations:

np.where(X, self.w_[1:]) + self.w_[0] equals ([5.1, 1.4]*[0.,0.]) + 0 = 0

np.where(self.net_input(X) >= 0.0, 1, -1) equals 1 (because *self.net_input(X)* = 0)

update = self.eta * (target - self.predict(xi)) equals update = 0.01 * (-1-1) = -0.02

self.w_[1:] += update * xi equals [0.,0.] += -0.02 * 0.01 = [-0.0002, -0.0002]

self.w_[0] = update(-0.02)

And that's what we 'have':

self.w_ = array([-0.02 , -0.0002, -0.0002])

However, what I see after first the first iteration on the breakpoint is:

self.w_ = array([-0.02 , -0.102, -0.028])

I started learning ML 2 days ago, so maybe I'm missing something important?

P.S. Code is working well

2

There are 2 best solutions below

0
Pruha On BEST ANSWER

hello bloowy im here to tell you to use sigmoid which equals to softmaxx**2e andl ike this you can find a solution to your problem

0
Lawrence Specter On

The goal of gradient descent is to minimize the cost function C(w, x). Notice that it is parameterized by the weights in addition to x, itself. Since the function is highly complex, we use a variant of Newton's method called gradient descent rather than simply solving for w s.t C(w, x) = 0. We take C'(x) which is moving towards the maximum and move w opposite of it to minimize C. However, to avoid overshooting, we use eta or learning rate to move only small steps at a time. Through each step, we get approach a minimum of the function. It's might not be the absolute minimum or a 0 of the function but it will be at least a relatively minimum at the end.