Producing radiologist-quality reports for interpretable artificial intelligence
This addresses the need for interpretable AI in medical diagnosis by providing language-based explanations, though it is incremental as it builds on existing interpretability methods.
The paper tackled the problem of explaining deep learning decisions in medical tasks by proposing a model-agnostic method that generates descriptive sentences, tested on hip fracture detection from x-rays, resulting in sentences containing desired information, doctor preference over saliency maps, and improved performance when combined with visualizations.
Current approaches to explaining the decisions of deep learning systems for medical tasks have focused on visualising the elements that have contributed to each decision. We argue that such approaches are not enough to "open the black box" of medical decision making systems because they are missing a key component that has been used as a standard communication tool between doctors for centuries: language. We propose a model-agnostic interpretability method that involves training a simple recurrent neural network model to produce descriptive sentences to clarify the decision of deep learning classifiers. We test our method on the task of detecting hip fractures from frontal pelvic x-rays. This process requires minimal additional labelling despite producing text containing elements that the original deep learning classification model was not specifically trained to detect. The experimental results show that: 1) the sentences produced by our method consistently contain the desired information, 2) the generated sentences are preferred by doctors compared to current tools that create saliency maps, and 3) the combination of visualisations and generated text is better than either alone.