CVJul 5, 2022

Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases

arXiv:2207.01821v219 citationsh-index: 33
Originality Incremental advance
AI Analysis

This addresses the need for more interpretable and detailed 3D scene understanding for AI applications, though it is incremental as it builds on existing 3DVG methods.

The paper tackles the problem of fine-grained 3D visual grounding by introducing 3D Phrase Aware Grounding (3DPAG), which localizes target objects by identifying phrase-related objects and reasoning with contextual phrases, resulting in accuracy gains of 3.9%, 3.5%, and 4.6% on three datasets.

Recent progress in 3D scene understanding has explored visual grounding (3DVG) to localize a target object through a language description. However, existing methods only consider the dependency between the entire sentence and the target object, ignoring fine-grained relationships between contexts and non-target ones. In this paper, we extend 3DVG to a more fine-grained and interpretable task, called 3D Phrase Aware Grounding (3DPAG). The 3DPAG task aims to localize the target objects in a 3D scene by explicitly identifying all phrase-related objects and then conducting the reasoning according to contextual phrases. To tackle this problem, we manually labeled about 227K phrase-level annotations using a self-developed platform, from 88K sentences of widely used 3DVG datasets, i.e., Nr3D, Sr3D and ScanRefer. By tapping on our datasets, we can extend previous 3DVG methods to the fine-grained phrase-aware scenario. It is achieved through the proposed novel phrase-object alignment optimization and phrase-specific pre-training, boosting conventional 3DVG performance as well. Extensive results confirm significant improvements, i.e., previous state-of-the-art method achieves 3.9%, 3.5% and 4.6% overall accuracy gains on Nr3D, Sr3D and ScanRefer respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes