CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning
This work addresses the challenge of improving segmentation accuracy and interpretability for complex lesions in medical imaging, representing an incremental advance by integrating reasoning with segmentation.
The paper tackled the problem of segmenting complex lesions in medical images by shifting from visual pattern matching to cognitive reasoning analysis, achieving a mean Dice score of 37.06% (14.89% higher than the second-best baseline) and reducing the failure rate to 18.42%.
Medical image segmentation is undergoing a paradigm shift from conventional visual pattern matching to cognitive reasoning analysis. Although Multimodal Large Language Models (MLLMs) have shown promise in integrating linguistic and visual knowledge, significant gaps remain: existing general MLLMs possess broad common sense but lack the specialized visual reasoning required for complex lesions, whereas traditional segmentation models excel at pixel-level segmentation but lack logical interpretability. In this paper, we introduce ComLesion-14K, the first diverse Chain-of-Thought (CoT) benchmark for reasoning-driven complex lesion segmentation. To accomplish this task, we propose CORE-Seg, an end-to-end framework integrating reasoning with segmentation through a Semantic-Guided Prompt Adapter. We design a progressive training strategy from SFT to GRPO, equipped with an adaptive dual-granularity reward mechanism to mitigate reward sparsity. Our Method achieves state-of-the-art results with a mean Dice of 37.06\% (14.89\% higher than the second-best baseline), while reducing the failure rate to 18.42\%. Project Page: https://xyxl024.github.io/CORE-Seg.github.io/