IVCVJun 16, 2025

MultiViT2: A Data-augmented Multimodal Neuroimaging Prediction Framework via Latent Diffusion Model

arXiv:2506.13667v11 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing predictive outcomes in medical imaging for conditions like schizophrenia, though it appears incremental as it builds on a previous model.

The study tackled the problem of improving neuroimaging prediction by integrating structural and functional data, resulting in MultiViT2, which significantly outperformed the first-generation model in schizophrenia classification accuracy.

Multimodal medical imaging integrates diverse data types, such as structural and functional neuroimaging, to provide complementary insights that enhance deep learning predictions and improve outcomes. This study focuses on a neuroimaging prediction framework based on both structural and functional neuroimaging data. We propose a next-generation prediction model, \textbf{MultiViT2}, which combines a pretrained representative learning base model with a vision transformer backbone for prediction output. Additionally, we developed a data augmentation module based on the latent diffusion model that enriches input data by generating augmented neuroimaging samples, thereby enhancing predictive performance through reduced overfitting and improved generalizability. We show that MultiViT2 significantly outperforms the first-generation model in schizophrenia classification accuracy and demonstrates strong scalability and portability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes