IV CV LGJan 27, 2025

Skull-stripping induces shortcut learning in MRI-based Alzheimer's disease classification

Christian Tinauer, Maximilian Sackl, Rudolf Stollberger, Reinhold Schmidt, Stefan Ropele, Christian Langkammer

arXiv:2501.15831v45.1h-index: 39Has CodeInsights into Imaging

Originality Incremental advance

AI Analysis

This reveals a critical bias in medical imaging AI that could undermine trust and robustness, highlighting the need for interpretability to prevent misleading results in Alzheimer's diagnosis.

The study investigated why deep neural networks achieve high accuracy in Alzheimer's disease classification from MRI, finding that models rely on volumetric features from skull-stripping artifacts rather than gray-white matter texture, with performance remaining stable across preprocessing variations. This indicates shortcut learning where preprocessing cues, not intended biological signals, drive decisions.

Objectives: High classification accuracy of Alzheimer's disease (AD) from structural MRI has been achieved using deep neural networks, yet the specific image features contributing to these decisions remain unclear. In this study, the contributions of T1-weighted (T1w) gray-white matter texture, volumetric information, and preprocessing -- particularly skull-stripping -- were systematically assessed. Methods: A dataset of 990 matched T1w MRIs from AD patients and cognitively normal controls from the ADNI database were used. Preprocessing was varied through skull-stripping and intensity binarization to isolate texture and shape contributions. A 3D convolutional neural network was trained on each configuration, and classification performance was compared using exact McNemar tests with discrete Bonferroni-Holm correction. Feature relevance was analyzed using Layer-wise Relevance Propagation, image similarity metrics, and spectral clustering of relevance maps. Results: Despite substantial differences in image content, classification accuracy, sensitivity, and specificity remained stable across preprocessing conditions. Models trained on binarized images preserved performance, indicating minimal reliance on gray-white matter texture. Instead, volumetric features -- particularly brain contours introduced through skull-stripping -- were consistently used by the models. Conclusions: This behavior reflects a shortcut learning phenomenon, where preprocessing artifacts act as potentially unintended cues. The resulting Clever Hans effect emphasizes the critical importance of interpretability tools to reveal hidden biases and to ensure robust and trustworthy deep learning in medical imaging.

View on arXiv PDF Code

Similar