GraphMapper: Efficient Visual Navigation by Scene Graph Generation
This addresses the problem of inefficient navigation for autonomous agents, though it appears incremental as it builds on existing learning-based solutions.
The paper tackles the problem of enabling autonomous agents to navigate efficiently in new environments by learning to accumulate 3D scene graph representations while navigating, resulting in navigation policies that require fewer environment interactions than vision-based systems alone.
Understanding the geometric relationships between objects in a scene is a core capability in enabling both humans and autonomous agents to navigate in new environments. A sparse, unified representation of the scene topology will allow agents to act efficiently to move through their environment, communicate the environment state with others, and utilize the representation for diverse downstream tasks. To this end, we propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment by simultaneously learning to navigate through said environment. We demonstrate that our approach, GraphMapper, enables the learning of effective navigation policies through fewer interactions with the environment than vision-based systems alone. Further, we show that GraphMapper can act as a modular scene encoder to operate alongside existing Learning-based solutions to not only increase navigational efficiency but also generate intermediate scene representations that are useful for other future tasks.