CLMar 23, 2021

SelfExplain: A Self-Explaining Architecture for Neural Text Classifiers

arXiv:2103.12279v2674 citations
Originality Incremental advance
AI Analysis

This addresses the need for interpretability in AI systems for users who require transparent decision-making, though it is incremental as it builds on existing neural classifiers.

The paper tackled the problem of making neural text classifiers interpretable by introducing SelfExplain, a self-explaining architecture that uses phrase-based concepts to explain predictions, and the result showed that it maintains performance across five datasets while providing explanations deemed sufficient and trustworthy by human judges.

We introduce SelfExplain, a novel self-explaining model that explains a text classifier's predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies the most influential concepts in the training set for a given sample and (2) a locally interpretable layer that quantifies the contribution of each local input concept by computing a relevance score relative to the predicted label. Experiments across five text-classification datasets show that SelfExplain facilitates interpretability without sacrificing performance. Most importantly, explanations from SelfExplain show sufficiency for model predictions and are perceived as adequate, trustworthy and understandable by human judges compared to existing widely-used baselines.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes