I am trying to make plots of a SHAP analysis of a XGBoost model I trained. Something similar to this.
However, I used Dart booster, so shap.TreeExplainer does not work. Then, I am trying to use the shap.KernelExplainer which should work for me. However, it is not accepting any common type of input.
My code is like this:
First Attempt
# Data to predict
full_data = xgb.DMatrix(full_X, label=full_y, feature_names=feature_names)
# Pre-trained XGB model using DART booster
loaded_model.set_param({"device": "cuda"})
xgb_predict = lambda x: loaded_model.predict(x)
explainer = shap.KernelExplainer(xgb_predict, full_data)
And I get :
TypeError: Unknown type passed as data object: <class 'xgboost.core.DMatrix'>
Second Attempt
I have also tried to provide a numpy array:
X_np = np.array(full_X)
explainer = shap.KernelExplainer(xgb_predict, X_np)
But it also returns an error:
TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'numpy.ndarray'>)
I am using shap 0.44.0 and xgboost 2.0.2
How can I resolve the problem?
What is really happening
In case someone else faces this problem, here is what I found:
The
shap.KernelExplainertries to convert the data (source code in here and here):So it basically does not recognize the
xgboost.core.DMatrixtype. But, if one enters a Dataframe or a numpy array, it passes this conversion but then fails when it is given to the model, because the model was trained with aDMatrix.The workaround
To solve this, I have passed a pandas DataFrame as data to
shap.KernelExplainerand added a conversion to aDMatrixinside the supplied function that returns the model's predictions: