IV CVJun 9, 2025

Text-guided multi-stage cross-perception network for medical image segmentation

arXiv:2506.07475v21 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of low contrast in medical images for clinicians, though it appears incremental as it builds on text-guided methods.

The paper tackles the problem of weak semantic expression in medical image segmentation by proposing a text-guided multi-stage cross-perception network (TMC), which achieves Dice scores of 84.77%, 78.50%, and 88.73% on three public datasets, outperforming existing methods.

Medical image segmentation plays a crucial role in clinical medicine, serving as a tool for auxiliary diagnosis, treatment planning, and disease monitoring, thus facilitating physicians in the study and treatment of diseases. However, existing medical image segmentation methods are limited by the weak semantic expression of the target segmentation regions, which is caused by the low contrast between the target and non-target segmentation regions. To address this limitation, text prompt information has greast potential to capture the lesion location. However, existing text-guided methods suffer from insufficient cross-modal interaction and inadequate cross-modal feature expression. To resolve these issues, we propose the Text-guided Multi-stage Cross-perception network (TMC). In TMC, we introduce a multistage cross-attention module to enhance the model's understanding of semantic details and a multi-stage alignment loss to improve the consistency of cross-modal semantics. The results of the experiments demonstrate that our TMC achieves a superior performance with Dice of 84.77%, 78.50%, 88.73% in three public datasets (QaTa-COV19, MosMedData and Breast), outperforming UNet based networks and text-guided methods.

View on arXiv PDF

Similar