CLOct 14, 2021

The Irrationality of Neural Rationale Models

arXiv:2110.07550v2633 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work highlights a critical issue in interpretable NLP for researchers and practitioners, suggesting that current models might not achieve desired interpretability, making it incremental by questioning existing assumptions.

The paper argues that neural rationale models, which extract text segments for interpretable predictions, may not be as rational or interpretable as assumed, based on philosophical and empirical evidence, and calls for more rigorous evaluations.

Neural rationale models are popular for interpretable predictions of NLP tasks. In these, a selector extracts segments of the input text, called rationales, and passes these segments to a classifier for prediction. Since the rationale is the only information accessible to the classifier, it is plausibly defined as the explanation. Is such a characterization unconditionally correct? In this paper, we argue to the contrary, with both philosophical perspectives and empirical evidence suggesting that rationale models are, perhaps, less rational and interpretable than expected. We call for more rigorous and comprehensive evaluations of these models to ensure desired properties of interpretability are indeed achieved. The code can be found at https://github.com/yimingz89/Neural-Rationale-Analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes