I am using SHAP and, whilst experimenting with different mask tokens, I noticed that they result in quite different results and I can see that from the resulting .base_values of the explanation object. Here I show how I tried out the different mask_tokens:
mask_tokens = ['[MASK]', '', '...']
for mask in mask_tokens:
masker = shap.maskers.Text(tokenizer=r"\W+", collapse_mask_token=True)
masker.mask_token = mask
explainer = shap.Explainer(pipe, masker=masker, seed=1)
shap_values = explainer(
some_text
)
In my case the underlying task is binary text classification, where the data distribution is ~40:60 and the model is a fine-tuned BERT. The base values (== prediction for the mask_token only i.e., pipe(mask_token)) are as follows:
- '' (empty string) leads to [0.41945168, 0.58054835]
- '...' (SHAP default) leads to [0.15069047, 0.8493095 ]
- '[MASK]' (same as the mask token used by the model's tokenizer) leads to [0.28819373, 0.7118063 ]
Is there a right one to use to ensure max level possible of model's explainability? How does one go about deciding which one to use?