CLMay 20, 2019

Interpretable Neural Predictions with Differentiable Binary Variables

arXiv:1905.08160v21186 citations
Originality Incremental advance
AI Analysis

This addresses the need for interpretability in neural networks for text classification, offering a method that balances discrete selections with gradient-based training, though it is incremental in improving existing approaches.

The paper tackles the problem of making neural text classifiers more interpretable by having them provide justifications for predictions, proposing a method that jointly trains models to select rationales and classify based on them, achieving competitive performance with previous work on rationale extraction.

The success of neural networks comes hand in hand with a desire for more interpretability. We focus on text classifiers and make them more interpretable by having them provide a justification, a rationale, for their predictions. We approach this problem by jointly training two neural network models: a latent model that selects a rationale (i.e. a short and informative part of the input text), and a classifier that learns from the words in the rationale alone. Previous work proposed to assign binary latent masks to input positions and to promote short selections via sparsity-inducing penalties such as L0 regularisation. We propose a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE. In our formulation, we can tractably compute the expected value of penalties such as L0, which allows us to directly optimise the model towards a pre-specified text selection rate. We show that our approach is competitive with previous work on rationale extraction, and explore further uses in attention mechanisms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes