On Evaluation of Embodied Navigation Agents
This work provides a framework to coordinate research and improve reproducibility in the domain of embodied navigation agents, though it is incremental as it focuses on methodology rather than new algorithms.
The paper addresses the lack of standardized evaluation protocols in embodied navigation research by convening a working group to develop consensus recommendations, including problem statements, evaluation measures, and standard benchmarking scenarios.
Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence. The past two years have seen a surge of creative work on navigation. This creative output has produced a plethora of sometimes incompatible task definitions and evaluation protocols. To coordinate ongoing and future research in this area, we have convened a working group to study empirical methodology in navigation research. The present document summarizes the consensus recommendations of this working group. We discuss different problem statements and the role of generalization, present evaluation measures, and provide standard scenarios that can be used for benchmarking.