Interpretability of Machine Learning Methods Applied to Neuroimaging
This work helps neuroimaging researchers choose and validate interpretability methods to ensure model reliability, but it is incremental as it synthesizes existing approaches without introducing new techniques.
The paper addresses the challenge of selecting and evaluating interpretability methods for deep learning models in neuroimaging, noting that models can achieve high performance using irrelevant features due to dataset biases. It provides an overview of common interpretability methods and metrics to assess their reliability, focusing on representative studies in this context.
Deep learning methods have become very popular for the processing of natural images, and were then successfully adapted to the neuroimaging field. As these methods are non-transparent, interpretability methods are needed to validate them and ensure their reliability. Indeed, it has been shown that deep learning models may obtain high performance even when using irrelevant features, by exploiting biases in the training set. Such undesirable situations can potentially be detected by using interpretability methods. Recently, many methods have been proposed to interpret neural networks. However, this domain is not mature yet. Machine learning users face two major issues when aiming to interpret their models: which method to choose, and how to assess its reliability? Here, we aim at providing answers to these questions by presenting the most common interpretability methods and metrics developed to assess their reliability, as well as their applications and benchmarks in the neuroimaging context. Note that this is not an exhaustive survey: we aimed to focus on the studies which we found to be the most representative and relevant.