3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable Scene Graphs
This work addresses the need for robots to operate in dynamically changing environments shared with other agents, representing an incremental advance in scene understanding for robotics.
The paper tackles the problem of predicting long-term semantic scene changes in shared environments by formalizing semantic scene variability estimation and proposing the Variable Scene Graph (VSG) representation. It introduces DeltaVSG, a supervised method that achieves 77.1% accuracy and 72.3% recall on the 3RScan dataset and speeds up active robotic change detection by 66.0%.
Numerous applications require robots to operate in environments shared with other agents, such as humans or other robots. However, such shared scenes are typically subject to different kinds of long-term semantic scene changes. The ability to model and predict such changes is thus crucial for robot autonomy. In this work, we formalize the task of semantic scene variability estimation and identify three main varieties of semantic scene change: changes in the position of an object, its semantic state, or the composition of a scene as a whole. To represent this variability, we propose the Variable Scene Graph (VSG), which augments existing 3D Scene Graph (SG) representations with the variability attribute, representing the likelihood of discrete long-term change events. We present a novel method, DeltaVSG, to estimate the variability of VSGs in a supervised fashion. We evaluate our method on the 3RScan long-term dataset, showing notable improvements in this novel task over existing approaches. Our method DeltaVSG achieves an accuracy of 77.1% and a recall of 72.3%, often mimicking human intuition about how indoor scenes change over time. We further show the utility of VSG prediction in the task of active robotic change detection, speeding up task completion by 66.0% compared to a scene-change-unaware planner. We make our code available as open-source.