CV AIMar 6

Prompt Group-Aware Training for Robust Text-Guided Nuclei Segmentation

Yonghuang Wu, Zhenyang Liang, Wenwen Zeng, Xuan Xie, Jinhua Yu

arXiv:2603.06384v19.4h-index: 15

Predicted impact top 51% in CV · last 90 daysOriginality Highly original

AI Analysis

This work improves the robustness and generalization of text-guided nuclei segmentation for computational pathology, which is crucial for reliable clinical and pathology workflows.

This paper addresses the problem of prompt sensitivity in text-guided medical image segmentation, where semantically equivalent prompts lead to inconsistent results. The authors propose a prompt group-aware training framework that uses quality-guided group regularization and a logit-level consistency constraint to align predictions within prompt groups, achieving an average Dice improvement of 2.16 points on six zero-shot cross-dataset nuclei segmentation tasks.

Foundation models such as Segment Anything Model 3 (SAM3) enable flexible text-guided medical image segmentation, yet their predictions remain highly sensitive to prompt formulation. Even semantically equivalent descriptions can yield inconsistent masks, limiting reliability in clinical and pathology workflows. We reformulate prompt sensitivity as a group-wise consistency problem. Semantically related prompts are organized into \emph{prompt groups} sharing the same ground-truth mask, and a prompt group-aware training framework is introduced for robust text-guided nuclei segmentation. The approach combines (i) a quality-guided group regularization that leverages segmentation loss as an implicit ranking signal, and (ii) a logit-level consistency constraint with a stop-gradient strategy to align predictions within each group. The method requires no architectural modification and leaves inference unchanged. Extensive experiments on multi-dataset nuclei benchmarks show consistent gains under textual prompting and markedly reduced performance variance across prompt quality levels. On six zero-shot cross-dataset tasks, our method improves Dice by an average of 2.16 points. These results demonstrate improved robustness and generalization for vision-language segmentation in computational pathology.

View on arXiv PDF

Similar