Using AdaBoostClassifier with null values

288 Views Asked by At

I'm trying to implement a AdaBoostClassifier model in Python. I'm using a dataset where all columns are numbers and in some cases the numbers are null.

Using adaboost in R, it seams that R deals with nulls automaticaly however when I try to do the same thing in python I get the error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

If I try to manually fix this with:

X.fillna(X.mean(), inplace=True)

The problem goes away. But I do not want to average the null values. Can AdaBoostClassifier work with null in python? or do i have to treat them first?

PS: I tried to give allow_nan=True in the validation function that adaboost uses but... I really don't know how to do that in the correct form.

Thank you

import pandas as pdfrom sklearn.ensemble import AdaBoostClassifierfrom sklearn.datasets import make_classification

data = pd.read_excel("C:\Users\file.xlsx")

X =data[["oprevenue","total_assets","fixed_assets","cost_of_employees","sales","ebitda","volume","number_of_employees"]]y = data.iloc[:,-1]

AdaModel = AdaBoostClassifier(n_estimators=100,learning_rate=1)
model = AdaModel.fit(X,y)  #-->Blows up here
previsao = model.predict(X)
1

There are 1 best solutions below

0
Jane Delaney On

The default base estimator for AdaBoostClassifier is a DecisionTree which does not handle missing values. You need to either impute your missing values first or use a different base estimator that can handle them.