CVFeb 22

SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation

Yujie Lu, Jingwen Li, Sibo Ju, Yanzhou Su, he yao, Yisong Liu, Min Zhu, Junlong Cheng

arXiv:2602.19213v11 citationsh-index: 4

Originality Highly original

AI Analysis

This work advances practical deployment of foundation vision models in clinical applications by enabling efficient and robust medical image segmentation with low annotation costs.

The paper tackles the challenge of adapting general segmentation models like SAM to medical imaging by addressing bottlenecks in modality- and anatomy-specific generalization and noisy supervision from large datasets, proposing SegMoTE which achieves state-of-the-art performance across diverse medical tasks with less than 1% of existing dataset size.

Medical image segmentation is vital for clinical diagnosis and quantitative analysis, yet remains challenging due to the heterogeneity of imaging modalities and the high cost of pixel-level annotations. Although general interactive segmentation models like SAM have achieved remarkable progress, their transfer to medical imaging still faces two key bottlenecks: (i) the lack of adaptive mechanisms for modality- and anatomy-specific tasks, which limits generalization in out-of-distribution medical scenarios; and (ii) current medical adaptation methods fine-tune on large, heterogeneous datasets without selection, leading to noisy supervision, higher cost, and negative transfer. To address these issues, we propose SegMoTE, an efficient and adaptive framework for medical image segmentation. SegMoTE preserves SAM's original prompt interface, efficient inference, and zero-shot generalization while introducing only a small number of learnable parameters to dynamically adapt across modalities and tasks. In addition, we design a progressive prompt tokenization mechanism that enables fully automatic segmentation, significantly reducing annotation dependence. Trained on MedSeg-HQ, a curated dataset less than 1% of existing large-scale datasets, SegMoTE achieves SOTA performance across diverse imaging modalities and anatomical tasks. It represents the first efficient, robust, and scalable adaptation of general segmentation models to the medical domain under extremely low annotation cost, advancing the practical deployment of foundation vision models in clinical applications.

View on arXiv PDF

Similar