CVMay 18

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Xiangxiang Cui, Tianjin Huang, Yifang Wang, Lijie Hu, Lu Yin

arXiv:2605.1902770.0

Predicted impact top 43% in CV · last 90 daysOriginality Synthesis-oriented

AI Analysis

For clinicians and healthcare AI developers, this work highlights the vulnerability of MedFMs to distribution shifts, emphasizing the need for robustness evaluation before clinical deployment.

This paper benchmarks the robustness of medical foundation models (MedFMs) including Med-VLMs and segmentation models under real-world corruptions and perturbations, finding that performance degrades significantly across tasks.

Medical foundation models (MedFMs) have emerged as transformative tools in healthcare, demonstrating capabilities across diverse clinical applications. These models can be broadly categorized into two paradigms: Medical Vision-Language Models (Med-VLMs) and segmentation foundation models. Med-VLMs range from medical-specialized models such as LLaVA-Med and MedGemma, to general-purpose models like GPT-4o and Gemini, all capable of medical image understanding tasks including visual question answering (VQA), report generation, and visual grounding. Concurrently, the Segment Anything Model (SAM) has catalyzed a new generation of medical segmentation models, with adaptations like SAM-Med2D and MedSAM. The widespread clinical deployment of these models thus necessitates rigorous evaluation of their reliability under real-world conditions.

View on arXiv PDF

Similar