X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnosis
This addresses the need for interpretable models in clinical decision-making for neurological disorders, though it is incremental as it builds on existing surface data and vision transformer methods.
The paper tackled the challenge of interpreting 3D brain imaging data for dementia diagnosis by developing X-SiT, an inherently interpretable neural network that uses cortical surface renderings, achieving state-of-the-art performance in detecting Alzheimer's disease and frontotemporal dementia.
Interpretable models are crucial for supporting clinical decision-making, driving advances in their development and application for medical images. However, the nature of 3D volumetric data makes it inherently challenging to visualize and interpret intricate and complex structures like the cerebral cortex. Cortical surface renderings, on the other hand, provide a more accessible and understandable 3D representation of brain anatomy, facilitating visualization and interactive exploration. Motivated by this advantage and the widespread use of surface data for studying neurological disorders, we present the eXplainable Surface Vision Transformer (X-SiT). This is the first inherently interpretable neural network that offers human-understandable predictions based on interpretable cortical features. As part of X-SiT, we introduce a prototypical surface patch decoder for classifying surface patch embeddings, incorporating case-based reasoning with spatially corresponding cortical prototypes. The results demonstrate state-of-the-art performance in detecting Alzheimer's disease and frontotemporal dementia while additionally providing informative prototypes that align with known disease patterns and reveal classification errors.