Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots
This addresses the need for socially intelligent robots to act in context-aware ways, though it is incremental by augmenting existing 3D Scene Graph methods.
The paper tackles the problem of enabling robots to understand human interactions in 3D environments by introducing Social 3D Scene Graphs, which capture humans, their attributes, activities, and relationships, and it shows that this representation improves human activity prediction and reasoning about human-environment relations.
Understanding how people interact with their surroundings and each other is essential for enabling robots to act in socially compliant and context-aware ways. While 3D Scene Graphs have emerged as a powerful semantic representation for scene understanding, existing approaches largely ignore humans in the scene, also due to the lack of annotated human-environment relationships. Moreover, existing methods typically capture only open-vocabulary relations from single image frames, which limits their ability to model long-range interactions beyond the observed content. We introduce Social 3D Scene Graphs, an augmented 3D Scene Graph representation that captures humans, their attributes, activities and relationships in the environment, both local and remote, using an open-vocabulary framework. Furthermore, we introduce a new benchmark consisting of synthetic environments with comprehensive human-scene relationship annotations and diverse types of queries for evaluating social scene understanding in 3D. The experiments demonstrate that our representation improves human activity prediction and reasoning about human-environment relations, paving the way toward socially intelligent robots.