I am trying to implement a normalization function manually rather than using the scikit learn's one. The reason is that, I need to define the maximum and minimum parameters manually and scikit learn doesn't allow that alteration.
I successfully implemented this to normalize the values between 0 and 1. But it is taking a very long time to run.
Question: Is there another efficient way I can do this? How can I make this execute faster.
Shown below is my code:
scaled_train_data = scale(train_data)
def scale(data):
for index, row in data.iterrows():
X_std = (data.loc[index, "Close"] - 10) / (2000 - 10)
data.loc[index, "Close"] = X_std
return data
2000 and 10 are the attributes that i defined manually rather than taking the minimum and the maximum value of the dataset.
Thank you in advance.
Why loop? You can just use
to make use of vectorized numpy functions. Of course, you could also put this in a function, if you prefer.
Alternatively, if you want to rescale to a linear range, you could use http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html. The advantage of this is that you can save it and then rescale the test data in the same manner.