IVAICVJun 30, 2025

Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)

arXiv:2507.00185v12 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the need for adaptable and versatile AI models in medical imaging across diverse disciplines, though it appears incremental as it builds on existing foundation model approaches.

The paper tackled the problem of inconsistent clinical accuracy in multimodal and multi-disease medical imaging models by developing MerMED-FM, a foundation model trained on 3.3 million images across seven modalities, achieving AUROCs ranging from 0.858 to 0.988 across specialties.

Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundation model trained using self-supervised learning and a memory module. MerMED-FM was trained on 3.3 million medical images from over ten specialties and seven modalities, including computed tomography (CT), chest X-rays (CXR), ultrasound (US), pathology patches, color fundus photography (CFP), optical coherence tomography (OCT) and dermatology images. MerMED-FM was evaluated across multiple diseases and compared against existing foundational models. Strong performance was achieved across all modalities, with AUROCs of 0.988 (OCT); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (skin); 0.894 (CFP); 0.858 (CXR). MerMED-FM has the potential to be a highly adaptable, versatile, cross-specialty foundation model that enables robust medical imaging interpretation across diverse medical disciplines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes