CVAINov 24, 2025

MedSAM3: Delving into Segment Anything with Medical Concepts

arXiv:2511.19046v115 citationsHas Code
Originality Highly original
AI Analysis

This work addresses the lack of generalizability and high annotation costs in medical image segmentation for clinical applications, representing a novel method for a known bottleneck.

The authors tackled the problem of medical image segmentation by proposing MedSAM-3, a text-promptable model that enables precise targeting of anatomical structures via open-vocabulary descriptions, and it significantly outperforms existing specialist and foundation models across diverse imaging modalities.

Medical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application. Here, we propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmentation. By fine-tuning the Segment Anything Model (SAM) 3 architecture on medical images paired with semantic conceptual labels, our MedSAM-3 enables medical Promptable Concept Segmentation (PCS), allowing precise targeting of anatomical structures via open-vocabulary text descriptions rather than solely geometric prompts. We further introduce the MedSAM-3 Agent, a framework that integrates Multimodal Large Language Models (MLLMs) to perform complex reasoning and iterative refinement in an agent-in-the-loop workflow. Comprehensive experiments across diverse medical imaging modalities, including X-ray, MRI, Ultrasound, CT, and video, demonstrate that our approach significantly outperforms existing specialist and foundation models. We will release our code and model at https://github.com/Joey-S-Liu/MedSAM3.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes