CLApr 28, 2020

The Explanation Game: Towards Prediction Explainability through Sparse Communication

arXiv:2004.13876v21010 citations
AI Analysis

This work addresses the need for better explainability in NLP systems, offering a novel framework that could enhance trust and understanding for users, though it is incremental in building on existing methods.

The paper tackles the problem of explainability in NLP by framing it as a communication task between an explainer and a layperson, comparing methods like gradients and attention, and finds that attention-based explainers outperform others in experiments across text classification, entailment, and translation tasks, with human evaluations showing improved communication success and faithfulness.

Explainability is a topic of growing importance in NLP. In this work, we provide a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier's decision. We use this framework to compare several prior approaches for extracting explanations, including gradient methods, representation erasure, and attention mechanisms, in terms of their communication success. In addition, we reinterpret these methods at the light of classical feature selection, and we use this as inspiration to propose new embedded methods for explainability, through the use of selective, sparse attention. Experiments in text classification, natural language entailment, and machine translation, using different configurations of explainers and laypeople (including both machines and humans), reveal an advantage of attention-based explainers over gradient and erasure methods. Furthermore, human evaluation experiments show promising results with post-hoc explainers trained to optimize communication success and faithfulness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes