I'm trying to implement a AdaBoostClassifier model in Python. I'm using a dataset where all columns are numbers and in some cases the numbers are null.
Using adaboost in R, it seams that R deals with nulls automaticaly however when I try to do the same thing in python I get the error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
If I try to manually fix this with:
X.fillna(X.mean(), inplace=True)
The problem goes away. But I do not want to average the null values. Can AdaBoostClassifier work with null in python? or do i have to treat them first?
PS: I tried to give allow_nan=True in the validation function that adaboost uses but... I really don't know how to do that in the correct form.
Thank you
import pandas as pdfrom sklearn.ensemble import AdaBoostClassifierfrom sklearn.datasets import make_classification
data = pd.read_excel("C:\Users\file.xlsx")
X =data[["oprevenue","total_assets","fixed_assets","cost_of_employees","sales","ebitda","volume","number_of_employees"]]y = data.iloc[:,-1]
AdaModel = AdaBoostClassifier(n_estimators=100,learning_rate=1)
model = AdaModel.fit(X,y) #-->Blows up here
previsao = model.predict(X)
The default base estimator for AdaBoostClassifier is a DecisionTree which does not handle missing values. You need to either impute your missing values first or use a different base estimator that can handle them.