CVApr 13

TAMISeg: Text-Aligned Multi-scale Medical Image Segmentation with Semantic Encoder Distillation

Qiang Gao, Yi Wang, Yong Zhang, Yong Li, Yongbing Deng, Lan Du, Cunjian Chen

arXiv:2604.1091280.5h-index: 6Has Code

Predicted impact top 27% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For medical image segmentation, TAMISeg reduces reliance on fine-grained annotations by leveraging text prompts, but the improvement is incremental over existing multi-modal methods.

TAMISeg proposes a text-guided medical image segmentation framework using clinical language prompts and semantic distillation to improve segmentation accuracy with limited annotations. It outperforms existing methods on Kvasir-SEG, MosMedData+, and QaTa-COV19 datasets.

Medical image segmentation remains challenging due to limited fine-grained annotations, complex anatomical structures, and image degradation from noise, low contrast, or illumination variation. We propose TAMISeg, a text-guided segmentation framework that incorporates clinical language prompts and semantic distillation as auxiliary semantic cues to enhance visual understanding and reduce reliance on pixel-level fine-grained annotations. TAMISeg integrates three core components: a consistency-aware encoder pretrained with strong perturbations for robust feature extraction, a semantic encoder distillation module with supervision from a frozen DINOv3 teacher to enhance semantic discriminability, and a scale-adaptive decoder that segments anatomical structures across different spatial scales. Experiments on the Kvasir-SEG, MosMedData+, and QaTa-COV19 datasets demonstrate that TAMISeg consistently outperforms existing uni-modal and multi-modal methods in both qualitative and quantitative evaluations. Code will be made publicly available at https://github.com/qczggaoqiang/TAMISeg.

View on arXiv PDF Code

Similar