MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
This work provides a new benchmark for evaluating the ability of embodied navigation agents to use semantic map memory, which is important for researchers developing long-horizon navigation systems.
This paper introduces the multiON task, a new benchmark for evaluating semantic map memory in navigation agents, which requires agents to visit a sequence of objects in photorealistic 3D environments. They found that navigation performance degrades significantly with increased task complexity, and a simple semantic map agent performed surprisingly well compared to more complex neural image feature map agents.
Navigation tasks in photorealistic 3D environments are challenging because they require perception and effective planning under partial observability. Recent work shows that map-like memory is useful for long-horizon navigation tasks. However, a focused investigation of the impact of maps on navigation tasks of varying complexity has not yet been performed. We propose the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment. MultiON generalizes the ObjectGoal navigation task and explicitly tests the ability of navigation agents to locate previously observed goal objects. We perform a set of multiON experiments to examine how a variety of agent models perform across a spectrum of navigation task complexities. Our experiments show that: i) navigation performance degrades dramatically with escalating task complexity; ii) a simple semantic map agent performs surprisingly well relative to more complex neural image feature map agents; and iii) even oracle map agents achieve relatively low performance, indicating the potential for future work in training embodied navigation agents using maps. Video summary: https://youtu.be/yqTlHNIcgnY