ROCVSep 27, 2023

Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

arXiv:2309.15940v137 citationsh-index: 48
Originality Highly original
AI Analysis

This addresses the challenge of open-vocabulary, context-aware entity localization for robotics applications like navigation and manipulation, representing a novel extension beyond conventional methods.

The paper tackles the problem of grounding entities like objects and agents in 3D scenes using free-form text queries, enabling context-aware localization such as 'pick up a cup on a kitchen table', and demonstrates that their approach significantly outperforms previous semantic-based localization techniques in experiments on ScanNet and a self-collected dataset.

We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes