ROAIDec 18, 2023

Indoor and Outdoor 3D Scene Graph Generation via Language-Enabled Spatial Ontologies

arXiv:2312.11713v248 citationsh-index: 11IEEE Robot Autom Lett
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable 3D scene understanding for robotics in diverse environments, representing an incremental advance by extending existing indoor methods to outdoor settings.

This paper tackles the problem of generating 3D scene graphs in both indoor and outdoor environments, which is challenging due to complex hierarchies and lack of training data, by proposing a method that uses a Large Language Model to build spatial ontologies and Logic Tensor Networks to incorporate logical rules, resulting in a significant increase in quality with sparsely annotated data.

This paper proposes an approach to build 3D scene graphs in arbitrary indoor and outdoor environments. Such extension is challenging; the hierarchy of concepts that describe an outdoor environment is more complex than for indoors, and manually defining such hierarchy is time-consuming and does not scale. Furthermore, the lack of training data prevents the straightforward application of learning-based tools used in indoor settings. To address these challenges, we propose two novel extensions. First, we develop methods to build a spatial ontology defining concepts and relations relevant for indoor and outdoor robot operation. In particular, we use a Large Language Model (LLM) to build such an ontology, thus largely reducing the amount of manual effort required. Second, we leverage the spatial ontology for 3D scene graph construction using Logic Tensor Networks (LTN) to add logical rules, or axioms (e.g., "a beach contains sand"), which provide additional supervisory signals at training time thus reducing the need for labelled data, providing better predictions, and even allowing predicting concepts unseen at training time. We test our approach in a variety of datasets, including indoor, rural, and coastal environments, and show that it leads to a significant increase in the quality of the 3D scene graph generation with sparsely annotated data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes