ROMay 29

Seeing Fast and Slow: Bimodal 3D Scene Graphs for Open-set Tasks

arXiv:2605.3106759.9Has Code
Predicted impact top 34% in RO · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of efficient and adaptable 3D scene representation for robots performing open-set tasks, allowing for real-time deployment.

This paper proposes BiMoSG, a bimodal 3D scene graph generation approach that can switch between a fast, coarse representation and a slow, fine open-vocabulary representation. This approach is significantly faster than state-of-the-art methods, enabling real-time integration with task execution.

Open-set task execution can significantly benefit from seamlessly switching between coarse and fine scene representations depending on the context and the evolving information as the robot explores the environment. For example, it is often sufficient to start with a coarse scene representation initially and only employ a finer, more granular scene representation when the robot encounters regions which are likely to contain the task relevant objects. Hence, in this work, we propose BiMoSG, a bimodal 3D scene graph generation approach for open-set tasks. BiMoSG employs a "fast" mode by default to efficiently generate a coarse 3D scene graph and can switch to a "slow" mode for generating a finer open vocabulary 3D scene graph of task relevant objects. We demonstrate that our proposed 3D scene graph generation approach is significantly faster than the open-source state-of-the-art approaches. This allows us to integrate the scene graph generation process with task execution for real-time deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes