3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
This addresses the problem of inefficient 3D-aware navigation for embodied AI agents, offering a practical solution with reduced computational demands.
The paper tackles object goal navigation in unseen 3D environments by proposing a framework with two sub-policies that use online fused 3D points, achieving the best performance among modular-based methods on Matterport3D and Gibson datasets with up to 30x less training computational cost.
Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation. Through extensive experiments, we show that this framework can dramatically improve the performance in ObjectNav through learning from 3D scene representation. Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets, while requiring (up to 30x) less computational cost for training.