CVNov 21, 2024

Multimodal 3D Reasoning Segmentation with Complex Scenes

arXiv:2411.13927v47 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the problem of enabling embodied AI and similar applications to better interpret human intentions and handle complex multi-object 3D scenes, representing a domain-specific advancement.

The paper tackles the problem of 3D scene understanding by addressing the lack of reasoning ability and oversimplified scenarios in existing methods, proposing a 3D reasoning segmentation task for complex multi-object scenes with spatial relations, and introduces a benchmark and network that excel in this task.

The recent development in multimodal learning has greatly advanced the research in 3D scene understanding in various real-world tasks such as embodied AI. However, most existing studies are facing two common challenges: 1) they are short of reasoning ability for interaction and interpretation of human intentions and 2) they focus on scenarios with single-category objects and over-simplified textual descriptions and neglect multi-object scenarios with complicated spatial relations among objects. We address the above challenges by proposing a 3D reasoning segmentation task for reasoning segmentation with multiple objects in scenes. The task allows producing 3D segmentation masks and detailed textual explanations as enriched by 3D spatial relations among objects. To this end, we create ReasonSeg3D, a large-scale and high-quality benchmark that integrates 3D segmentation masks and 3D spatial relations with generated question-answer pairs. In addition, we design MORE3D, a novel 3D reasoning network that works with queries of multiple objects and is tailored for 3D scene understanding. MORE3D learns detailed explanations on 3D relations and employs them to capture spatial information of objects and reason textual outputs. Extensive experiments show that MORE3D excels in reasoning and segmenting complex multi-object 3D scenes. In addition, the created ReasonSeg3D offers a valuable platform for future exploration of 3D reasoning segmentation. The data and code will be released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes