CVMar 24, 2025

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, Francis Engelmann

arXiv:2503.19199v148 citationsh-index: 16CVPR

Originality Incremental advance

AI Analysis

This work addresses the challenge of modeling complex scene functionalities for applications like 3D question answering and robotic manipulation, representing an incremental advance by leveraging foundation models to overcome data scarcity.

The paper tackles the problem of predicting functional 3D scene graphs for real-world indoor spaces from RGB-D images, capturing objects and their functional relationships, and shows that their method significantly outperforms adapted baselines like Open3DSG and ConceptGraph on datasets such as SceneFun3D and FunGraph3D.

We introduce the task of predicting functional 3D scene graphs for real-world indoor environments from posed RGB-D images. Unlike traditional 3D scene graphs that focus on spatial relationships of objects, functional 3D scene graphs capture objects, interactive elements, and their functional relationships. Due to the lack of training data, we leverage foundation models, including visual language models (VLMs) and large language models (LLMs), to encode functional knowledge. We evaluate our approach on an extended SceneFun3D dataset and a newly collected dataset, FunGraph3D, both annotated with functional 3D scene graphs. Our method significantly outperforms adapted baselines, including Open3DSG and ConceptGraph, demonstrating its effectiveness in modeling complex scene functionalities. We also demonstrate downstream applications such as 3D question answering and robotic manipulation using functional 3D scene graphs. See our project page at https://openfungraph.github.io

View on arXiv PDF

Similar