KNNImputer Evaluation

143 Views Asked by At

I used KNNImputer for missing values in my dataset, I have a problem with the evaluation of this method while I am using MAE or MSE to compare both datasets, I received the error: Input contains NaN, infinity or a value too large for dtype('float64'). Of course, there is a missing value in the first data... Using cross-validation also doesn't help as I have to divide data, not sure anyway because my data is timestamp for different sensors as columns.

Code for calculating MSE :

import pandas as pd
from sklearn.impute import KNNImputer
from sklearn.metrics import mean_squared_error


# create a copy of data_clean to impute missing values
df = data_clean.copy()

# apply KNN imputation
imputer = KNNImputer(n_neighbors=5)
df[df.columns[1:]] = imputer.fit_transform(df[df.columns[1:]])

# calculate mean squared error for imputed values only
mask = ~df[df.columns[1:]].isna()  # create a mask to only consider imputed values
mse = mean_squared_error(data_clean[df.columns[1:]][mask], df[df.columns[1:]][mask])
print(f"Mean Squared Error: {mse}")
0

There are 0 best solutions below