Manual normalization function taking too long to execute

331 Views Asked by At

I am trying to implement a normalization function manually rather than using the scikit learn's one. The reason is that, I need to define the maximum and minimum parameters manually and scikit learn doesn't allow that alteration.

I successfully implemented this to normalize the values between 0 and 1. But it is taking a very long time to run.

Question: Is there another efficient way I can do this? How can I make this execute faster.

Shown below is my code:

scaled_train_data = scale(train_data)

def scale(data):
    for index, row in data.iterrows():
        X_std = (data.loc[index, "Close"] - 10) / (2000 - 10)
        data.loc[index, "Close"] = X_std

    return data

2000 and 10 are the attributes that i defined manually rather than taking the minimum and the maximum value of the dataset.

Thank you in advance.

2

There are 2 best solutions below

2
Jondiedoop On BEST ANSWER

Why loop? You can just use

train_data['close'] = (train_data['close'] - 10)/(2000 - 10) 

to make use of vectorized numpy functions. Of course, you could also put this in a function, if you prefer.

Alternatively, if you want to rescale to a linear range, you could use http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html. The advantage of this is that you can save it and then rescale the test data in the same manner.

1
BAKE ZQ On

Use numpy's matrix.you can also set your min and max mannually.

import numpy as np
data = np.array(df)
_min = np.min(data, axis=0)
_max = np.max(data, axis=0)
normed_data = (data - _min) / (_max - _min)