On the Granularity of Explanations in Model Agnostic NLP Interpretability
This work addresses the problem of inaccurate and inefficient explanations for NLP models, particularly BERT-based classifiers, by proposing a more granular approach to text perturbation.
This paper identifies limitations in word-based sampling for NLP interpretability with BERT classifiers, specifically out-of-distribution texts and high-dimensional search spaces. It proposes using segments, such as sentences, as elementary building blocks, which significantly improves fidelity on a benchmark classification task.
Current methods for Black-Box NLP interpretability, like LIME or SHAP, are based on altering the text to interpret by removing words and modeling the Black-Box response. In this paper, we outline limitations of this approach when using complex BERT-based classifiers: The word-based sampling produces texts that are out-of-distribution for the classifier and further gives rise to a high-dimensional search space, which can't be sufficiently explored when time or computation power is limited. Both of these challenges can be addressed by using segments as elementary building blocks for NLP interpretability. As illustration, we show that the simple choice of sentences greatly improves on both of these challenges. As a consequence, the resulting explainer attains much better fidelity on a benchmark classification task.