CVMay 24, 2025

Reasoning Segmentation for Images and Videos: A Survey

Yiqing Shen, Chenjia Li, Fei Xiong, Jeong-O Jeong, Tianpeng Wang, Michael Latman, Mathias Unberath

arXiv:2505.18816v112 citationsh-index: 8

Originality Synthesis-oriented

AI Analysis

This survey organizes and synthesizes existing research on RS to facilitate more intuitive human-AI interaction through natural language, but it is incremental as it does not introduce new methods or results.

This paper presents the first comprehensive survey of Reasoning Segmentation (RS), which aims to segment objects based on implicit text queries requiring reasoning and knowledge integration, covering 26 state-of-the-art methods, 29 datasets/benchmarks, and applications across domains.

Reasoning Segmentation (RS) aims to delineate objects based on implicit text queries, the interpretation of which requires reasoning and knowledge integration. Unlike the traditional formulation of segmentation problems that relies on fixed semantic categories or explicit prompting, RS bridges the gap between visual perception and human-like reasoning capabilities, facilitating more intuitive human-AI interaction through natural language. Our work presents the first comprehensive survey of RS for image and video processing, examining 26 state-of-the-art methods together with a review of the corresponding evaluation metrics, as well as 29 datasets and benchmarks. We also explore existing applications of RS across diverse domains and identify their potential extensions. Finally, we identify current research gaps and highlight promising future directions.

View on arXiv PDF

Similar