Teaching Meaningful Explanations
This addresses the need for interpretable AI in critical applications like healthcare and law, though it is incremental as it builds on existing methods with user data augmentation.
The paper tackles the problem of generating comprehensible explanations for machine learning predictions in high-stakes domains by augmenting training data with user-provided explanations and learning a joint model for labels and explanations. Results show that this approach reliably produces meaningful explanations across multiple datasets and can sometimes improve accuracy.
The adoption of machine learning in high-stakes applications such as healthcare and law has lagged in part because predictions are not accompanied by explanations comprehensible to the domain user, who often holds the ultimate responsibility for decisions and outcomes. In this paper, we propose an approach to generate such explanations in which training data is augmented to include, in addition to features and labels, explanations elicited from domain users. A joint model is then learned to produce both labels and explanations from the input features. This simple idea ensures that explanations are tailored to the complexity expectations and domain knowledge of the consumer. Evaluation spans multiple modeling techniques on a game dataset, a (visual) aesthetics dataset, a chemical odor dataset and a Melanoma dataset showing that our approach is generalizable across domains and algorithms. Results demonstrate that meaningful explanations can be reliably taught to machine learning algorithms, and in some cases, also improve modeling accuracy.