Testing the robustness of attribution methods for convolutional neural networks in MRI-based Alzheimer's disease classification
This work addresses the reliability of interpretability tools for medical AI, which is crucial for clinicians and researchers, but it is incremental as it focuses on comparing existing methods.
The study tested the robustness of four attribution methods for CNN-based Alzheimer's disease classification from MRI data, finding that some widely used methods produce highly inconsistent outcomes.
Attribution methods are an easy to use tool for investigating and validating machine learning models. Multiple methods have been suggested in the literature and it is not yet clear which method is most suitable for a given task. In this study, we tested the robustness of four attribution methods, namely gradient*input, guided backpropagation, layer-wise relevance propagation and occlusion, for the task of Alzheimer's disease classification. We have repeatedly trained a convolutional neural network (CNN) with identical training settings in order to separate structural MRI data of patients with Alzheimer's disease and healthy controls. Afterwards, we produced attribution maps for each subject in the test data and quantitatively compared them across models and attribution methods. We show that visual comparison is not sufficient and that some widely used attribution methods produce highly inconsistent outcomes.