I was under the impression that the enable_categorical parameter allows me to not do any manual label encoding. The error I am getting seems to contradict that? (I think)
The error that this code results in seems to be triggered by the calling the "fit" method on my "reg" object. Here is the error:
ValueError: Invalid classes inferred from unique values of `y`. Expected: [0 1 2 3 4 5], got ['Not Approved' 'Resolved-Approved' 'Resolved-Cancelled' 'Resolved-Not Approved' 'Resolved-Partially Approved' 'Resolved-Withdrawn']
FEATURES = ['Type', 'DivisionName', 'DepartmentName', 'WarehouseName', 'CategoryDesc']
TARGET = 'ClaimStatus'
X_train = train[FEATURES].astype('category')
y_train = train[TARGET].astype('category')
X_test = test[FEATURES].astype('category')
y_test = test[TARGET].astype('category')
reg = xgb.XGBClassifier(base_score=0.5, booster='gbtree',
n_estimators=1000,
early_stopping_rounds=50,
enable_categorical=True,
max_depth=5,
learning_rate=0.01)
reg.fit(X_train, y_train,
eval_set=[(X_train, y_train), (X_test, y_test)],
verbose=100)
enable_categoricaldoesn't affect the target type; it's for performing bipartition splits of categorical features:https://xgboost.readthedocs.io/en/release_2.0.0/tutorials/categorical.html
You may use sklearn's
LabelEncoderto encode the target;XGBClassifier, being specifically a classifier, will treat the resulting integers just as class labels. (In a little surprised it's needed though; all sklearn classifiers handle that internally. I thought I remembered some extra parameter to use/not a label encoder, but I can't find it now...)