mask_token for SHAP explainer for text: empty string or [MASK]?

138 Views Asked by planetx At 09 October 2023 at 13:20

I am using SHAP and, whilst experimenting with different mask tokens, I noticed that they result in quite different results and I can see that from the resulting .base_values of the explanation object. Here I show how I tried out the different mask_tokens:

mask_tokens = ['[MASK]', '', '...']

for mask in mask_tokens:
    masker = shap.maskers.Text(tokenizer=r"\W+", collapse_mask_token=True)
    masker.mask_token = mask
    explainer = shap.Explainer(pipe, masker=masker, seed=1)

    shap_values = explainer(
        some_text
    )

In my case the underlying task is binary text classification, where the data distribution is ~40:60 and the model is a fine-tuned BERT. The base values (== prediction for the mask_token only i.e., pipe(mask_token)) are as follows:

'' (empty string) leads to [0.41945168, 0.58054835]
'...' (SHAP default) leads to [0.15069047, 0.8493095 ]
'[MASK]' (same as the mask token used by the model's tokenizer) leads to [0.28819373, 0.7118063 ]

Is there a right one to use to ensure max level possible of model's explainability? How does one go about deciding which one to use?

Original Q&A

mask_token for SHAP explainer for text: empty string or [MASK]?

There are 0 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in DATA-SCIENCE

Related Questions in SHAP

Related Questions in XAI

Trending Questions

Popular # Hahtags

Popular Questions