CVNov 12, 2022

MultiCrossViT: Multimodal Vision Transformer for Schizophrenia Prediction using Structural MRI and Functional Network Connectivity Data

arXiv:2211.06726v24 citationsh-index: 35
Originality Incremental advance
AI Analysis

This work addresses schizophrenia diagnosis for medical imaging applications, representing an incremental improvement by applying a hybrid method to a specific domain.

The authors tackled schizophrenia prediction by developing MultiCrossViT, a multimodal Vision Transformer that analyzes structural MRI and functional network connectivity data, achieving an AUC of 0.832 on a dataset with minimal training subjects.

Vision Transformer (ViT) is a pioneering deep learning framework that can address real-world computer vision issues, such as image classification and object recognition. Importantly, ViTs are proven to outperform traditional deep learning models, such as convolutional neural networks (CNNs). Relatively recently, a number of ViT mutations have been transplanted into the field of medical imaging, thereby resolving a variety of critical classification and segmentation challenges, especially in terms of brain imaging data. In this work, we provide a novel multimodal deep learning pipeline, MultiCrossViT, which is capable of analyzing both structural MRI (sMRI) and static functional network connectivity (sFNC) data for the prediction of schizophrenia disease. On a dataset with minimal training subjects, our novel model can achieve an AUC of 0.832. Finally, we visualize multiple brain regions and covariance patterns most relevant to schizophrenia based on the resulting ViT attention maps by extracting features from transformer encoders.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes