Faithful and Plausible Explanations of Medical Code Predictions
This addresses the need for interpretable AI in clinical medicine to support human-machine decision-making, though it is incremental as it builds on existing proxy model approaches.
The paper tackled the problem of balancing faithfulness and plausibility in explanations for medical code predictions, proposing a proxy model that provides fine-grained control over these trade-offs and demonstrating its ability to replicate trained model behavior.
Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making. In clinical medicine and other high-risk settings, domain experts may be unwilling to trust model predictions without explanations. Work in explainable AI must balance competing objectives along two different axes: 1) Explanations must balance faithfulness to the model's decision-making with their plausibility to a domain expert. 2) Domain experts desire local explanations of individual predictions and global explanations of behavior in aggregate. We propose to train a proxy model that mimics the behavior of the trained model and provides fine-grained control over these trade-offs. We evaluate our approach on the task of assigning ICD codes to clinical notes to demonstrate that explanations from the proxy model are faithful and replicate the trained model behavior.