Are Shortest Rationales the Best Explanations for Human Understanding?
This work addresses a key problem in interpretable AI for researchers and practitioners by showing that current rationale extraction methods may be suboptimal for human users, though it is incremental as it builds on existing self-explaining models.
The paper tackles the assumption that shorter rationales are best for human understanding in self-explaining models by developing LimitedInk, a model that extracts rationales at any length, and finds that overly short rationales do not improve human prediction accuracy over random text.
Existing self-explaining models typically favor extracting the shortest possible rationales - snippets of an input text "responsible for" corresponding output - to explain the model prediction, with the assumption that shorter rationales are more intuitive to humans. However, this assumption has yet to be validated. Is the shortest rationale indeed the most human-understandable? To answer this question, we design a self-explaining model, LimitedInk, which allows users to extract rationales at any target length. Compared to existing baselines, LimitedInk achieves compatible end-task performance and human-annotated rationale agreement, making it a suitable representation of the recent class of self-explaining models. We use LimitedInk to conduct a user study on the impact of rationale length, where we ask human judges to predict the sentiment label of documents based only on LimitedInk-generated rationales with different lengths. We show rationales that are too short do not help humans predict labels better than randomly masked text, suggesting the need for more careful design of the best human rationales.