CLApr 4, 2020

Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection

arXiv:2004.02015v31023 citations
AI Analysis

This work addresses the challenge of interpretability in NLP for users needing to understand black-box model decisions, though it is incremental by building on existing feature-based methods.

The paper tackled the problem of generating explanations for neural text classifiers by detecting feature interactions, resulting in hierarchical explanations that are both faithful to models and interpretable to humans, as validated on three classifiers and two benchmark datasets.

Generating explanations for neural networks has become crucial for their applications in real-world with respect to reliability and trustworthiness. In natural language processing, existing methods usually provide important features which are words or phrases selected from an input text as an explanation, but ignore the interactions between them. It poses challenges for humans to interpret an explanation and connect it to model prediction. In this work, we build hierarchical explanations by detecting feature interactions. Such explanations visualize how words and phrases are combined at different levels of the hierarchy, which can help users understand the decision-making of black-box models. The proposed method is evaluated with three neural text classifiers (LSTM, CNN, and BERT) on two benchmark datasets, via both automatic and human evaluations. Experiments show the effectiveness of the proposed method in providing explanations that are both faithful to models and interpretable to humans.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes