CLLGApr 23, 2024

Does It Make Sense to Explain a Black Box With Another Black Box?

arXiv:2404.14943v11 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of interpretability in NLP for practitioners, highlighting that incremental improvements may not justify added complexity in explanation methods.

The paper compared transparent and opaque counterfactual explanation methods for NLP black-box classifiers, finding that opaque methods add complexity without significant performance gains on tasks like fake news detection and sentiment analysis.

Although counterfactual explanations are a popular approach to explain ML black-box classifiers, they are less widespread in NLP. Most methods find those explanations by iteratively perturbing the target document until it is classified differently by the black box. We identify two main families of counterfactual explanation methods in the literature, namely, (a) \emph{transparent} methods that perturb the target by adding, removing, or replacing words, and (b) \emph{opaque} approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently. This article offers a comparative study of the performance of these two families of methods on three classical NLP tasks. Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain. These observations motivate our discussion, which raises the question of whether it makes sense to explain a black box using another black box.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes