How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
This work provides efficiency improvements for researchers and practitioners training PointGoal navigation agents, particularly those with limited computational resources, by identifying key design choices that significantly boost performance under budget constraints.
This paper investigates PointGoal navigation under strict sample (75 million frames) and compute (1 GPU for 1 day) budgets. By optimizing advantage estimation, visual encoder architecture, and a hyper-parameter, RGB-D agents achieved an 8 SPL (14% relative) improvement on Gibson and 20 SPL (38% relative) on Matterport3D under a sample budget, and 19 SPL (32% relative) on Gibson and 35 SPL (220% relative) on Matterport3D under a compute budget.
PointGoal navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge. In this paper, we study PointGoal navigation under both a sample budget (75 million frames) and a compute budget (1 GPU for 1 day). We conduct an extensive set of experiments, cumulatively totaling over 50,000 GPU-hours, that let us identify and discuss a number of ostensibly minor but significant design choices -- the advantage estimation procedure (a key component in training), visual encoder architecture, and a seemingly minor hyper-parameter change. Overall, these design choices to lead considerable and consistent improvements over the baselines present in Savva et al. Under a sample budget, performance for RGB-D agents improves 8 SPL on Gibson (14% relative improvement) and 20 SPL on Matterport3D (38% relative improvement). Under a compute budget, performance for RGB-D agents improves by 19 SPL on Gibson (32% relative improvement) and 35 SPL on Matterport3D (220% relative improvement). We hope our findings and recommendations will make serve to make the community's experiments more efficient.