CVFeb 4

Seg-ReSearch: Segmentation with Interleaved Reasoning and External Search

arXiv:2602.04454v13 citationsh-index: 7Has Code
Originality Highly original
AI Analysis

This addresses the problem of handling dynamic, open-world queries in segmentation for computer vision researchers and practitioners, offering a novel approach beyond incremental improvements.

The paper tackles the limitation of multimodal large language models in segmentation due to frozen internal knowledge by proposing Seg-ReSearch, a paradigm that integrates interleaved reasoning and external search, achieving substantial improvements on benchmarks like OK-VOS and existing reasoning segmentation tasks.

Segmentation based on language has been a popular topic in computer vision. While recent advances in multimodal large language models (MLLMs) have endowed segmentation systems with reasoning capabilities, these efforts remain confined by the frozen internal knowledge of MLLMs, which limits their potential for real-world scenarios that involve up-to-date information or domain-specific concepts. In this work, we propose \textbf{Seg-ReSearch}, a novel segmentation paradigm that overcomes the knowledge bottleneck of existing approaches. By enabling interleaved reasoning and external search, Seg-ReSearch empowers segmentation systems to handle dynamic, open-world queries that extend beyond the frozen knowledge of MLLMs. To effectively train this capability, we introduce a hierarchical reward design that harmonizes initial guidance with progressive incentives, mitigating the dilemma between sparse outcome signals and rigid step-wise supervision. For evaluation, we construct OK-VOS, a challenging benchmark that explicitly requires outside knowledge for video object segmentation. Experiments on OK-VOS and two existing reasoning segmentation benchmarks demonstrate that our Seg-ReSearch improves state-of-the-art approaches by a substantial margin. Code and data will be released at https://github.com/iSEE-Laboratory/Seg-ReSearch.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes