ROCVMay 31

Expanding Spatial and Temporal Context for Robotic Imitation Learning With Scene Graphs

arXiv:2606.0107269.3
AI Analysis

This work addresses the challenge of spatial and temporal context in robotic imitation learning for real-world environments.

The paper proposes using scene graphs as a structured memory mechanism in imitation learning to handle partial observability and long-horizon tasks. Experiments show substantial improvements in policy performance for mobile and tabletop manipulation.

Imitation learning enables robots to learn how to execute tasks via observation. However, real-world environments like homes and offices are often severely partially observed due to their large spatial scales. In addition, many tasks involve executing a series of subtasks requiring autonomous robots to reason over extended time horizons. To address these challenges, we propose using scene graphs as an explicit and structured memory mechanism in imitation learning. By maintaining a dynamic scene graph that captures object-centric relationships and their evolution over time, our method allows the agent to retain relevant historical context during task execution to efficiently reason over incrementally accrued scene information. Our experiments on simulated mobile manipulation and real-world tabletop manipulation demonstrate that our approach substantially improves policy performance, particularly in settings that demand long-term reasoning and robust generalization under partial observability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes