Rationalizing Neural Predictions
This addresses the need for interpretable AI in domains like sentiment analysis, though it is incremental as it builds on existing modular and regularization techniques.
The paper tackles the problem of making neural predictions interpretable by learning to extract short, coherent text fragments as justifications, without requiring rationale annotations during training. It outperforms attention-based baselines on multi-aspect sentiment analysis and demonstrates applicability on question retrieval.
Prediction without justification has limited applicability. As a remedy, we learn to extract pieces of input text as justifications -- rationales -- that are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by desiderata for rationales. We evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. Our approach outperforms attention-based baseline by a significant margin. We also successfully illustrate the method on the question retrieval task.