An Interpretable Cross-Attentive Multi-modal MRI Fusion Framework for Schizophrenia Diagnosis
This work addresses the problem of multi-modal MRI fusion for mental disorder diagnosis, offering an incremental improvement with interpretable insights into schizophrenia biomarkers.
The paper tackles the challenge of combining fMRI and sMRI for schizophrenia diagnosis by proposing a Cross-Attentive Multi-modal Fusion framework (CAMF) that captures intra- and inter-modal relationships, resulting in improved classification accuracy on two datasets.
Both functional and structural magnetic resonance imaging (fMRI and sMRI) are widely used for the diagnosis of mental disorder. However, combining complementary information from these two modalities is challenging due to their heterogeneity. Many existing methods fall short of capturing the interaction between these modalities, frequently defaulting to a simple combination of latent features. In this paper, we propose a novel Cross-Attentive Multi-modal Fusion framework (CAMF), which aims to capture both intra-modal and inter-modal relationships between fMRI and sMRI, enhancing multi-modal data representation. Specifically, our CAMF framework employs self-attention modules to identify interactions within each modality while cross-attention modules identify interactions between modalities. Subsequently, our approach optimizes the integration of latent features from both modalities. This approach significantly improves classification accuracy, as demonstrated by our evaluations on two extensive multi-modal brain imaging datasets, where CAMF consistently outperforms existing methods. Furthermore, the gradient-guided Score-CAM is applied to interpret critical functional networks and brain regions involved in schizophrenia. The bio-markers identified by CAMF align with established research, potentially offering new insights into the diagnosis and pathological endophenotypes of schizophrenia.