CVDec 31, 2025

OFL-SAM2: Prompt SAM2 with Online Few-shot Learner for Efficient Medical Image Segmentation

arXiv:2512.24861v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the problem of labor-intensive annotation and expert intervention in medical image segmentation for healthcare applications, representing an incremental improvement by building on SAM2 with online few-shot learning.

The paper tackles the challenge of adapting the Segment Anything Model 2 (SAM2) to medical image segmentation by introducing OFL-SAM2, a prompt-free framework that uses a lightweight mapping network trained with limited annotated samples to eliminate manual prompts and achieves state-of-the-art performance on three diverse datasets.

The Segment Anything Model 2 (SAM2) has demonstrated remarkable promptable visual segmentation capabilities in video data, showing potential for extension to medical image segmentation (MIS) tasks involving 3D volumes and temporally correlated 2D image sequences. However, adapting SAM2 to MIS presents several challenges, including the need for extensive annotated medical data for fine-tuning and high-quality manual prompts, which are both labor-intensive and require intervention from medical experts. To address these challenges, we introduce OFL-SAM2, a prompt-free SAM2 framework for label-efficient MIS. Our core idea is to leverage limited annotated samples to train a lightweight mapping network that captures medical knowledge and transforms generic image features into target features, thereby providing additional discriminative target representations for each frame and eliminating the need for manual prompts. Crucially, the mapping network supports online parameter update during inference, enhancing the model's generalization across test sequences. Technically, we introduce two key components: (1) an online few-shot learner that trains the mapping network to generate target features using limited data, and (2) an adaptive fusion module that dynamically integrates the target features with the memory-attention features generated by frozen SAM2, leading to accurate and robust target representation. Extensive experiments on three diverse MIS datasets demonstrate that OFL-SAM2 achieves state-of-the-art performance with limited training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes