CLAISep 16, 2021

Let the CAT out of the bag: Contrastive Attributed explanations for Text

arXiv:2109.07983v3291 citations
Originality Highly original
AI Analysis

It addresses the need for more interpretable and actionable explanations in natural language processing, particularly for users seeking recourse from model decisions, though it is incremental as it builds on existing contrastive explanation methods.

The paper tackles the problem of generating contrastive explanations for black-box text models by introducing Contrastive Attributed explanations for Text (CAT), which uses attribute classifiers to produce semantically meaningful explanations with minimal edits, and shows it outperforms state-of-the-art methods across four datasets on four benchmark metrics.

Contrastive explanations for understanding the behavior of black box models has gained a lot of attention recently as they provide potential for recourse. In this paper, we propose a method Contrastive Attributed explanations for Text (CAT) which provides contrastive explanations for natural language text data with a novel twist as we build and exploit attribute classifiers leading to more semantically meaningful explanations. To ensure that our contrastive generated text has the fewest possible edits with respect to the original text, while also being fluent and close to a human generated contrastive, we resort to a minimal perturbation approach regularized using a BERT language model and attribute classifiers trained on available attributes. We show through qualitative examples and a user study that our method not only conveys more insight because of these attributes, but also leads to better quality (contrastive) text. Quantitatively, we show that our method outperforms other state-of-the-art methods across four data sets on four benchmark metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes