LG CVMar 12, 2025

Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment

Nazanin Moradinasab, Saurav Sengupta, Jiebei Liu, Sana Syed, Donald E. Brown

arXiv:2503.09498v19.42 citationsh-index: 10Has Code2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

Originality Incremental advance

AI Analysis

This work addresses unreliable predictions in healthcare due to missing multimodal data, offering a solution for real-world settings, though it appears incremental as it builds on existing techniques like mixture of experts and contrastive learning.

The paper tackles the problem of missing data in healthcare multimodal models by proposing MoSARe, a framework that integrates expert selection, cross-modal attention, and contrastive learning, achieving higher accuracy than existing models in both complete and incomplete data scenarios.

Healthcare relies on multiple types of data, such as medical images, genetic information, and clinical records, to improve diagnosis and treatment. However, missing data is a common challenge due to privacy restrictions, cost, and technical issues, making many existing multi-modal models unreliable. To address this, we propose a new multi-model model called Mixture of Experts, Symmetric Aligning, and Reconstruction (MoSARe), a deep learning framework that handles incomplete multimodal data while maintaining high accuracy. MoSARe integrates expert selection, cross-modal attention, and contrastive learning to improve feature representation and decision-making. Our results show that MoSARe outperforms existing models in situations when the data is complete. Furthermore, it provides reliable predictions even when some data are missing. This makes it especially useful in real-world healthcare settings, including resource-limited environments. Our code is publicly available at https://github.com/NazaninMn/MoSARe.

View on arXiv PDF Code

Similar