CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
This addresses the challenge of scalable and robust autonomous navigation in diverse real-world environments for robotics applications, representing a strong specific gain rather than an incremental improvement.
The paper tackles the problem of open-world generalization and robustness in outdoor urban mapless navigation by introducing CREStE, a learning-based framework that uses visual foundation model distillation and counterfactual inverse reinforcement learning, resulting in outperforming state-of-the-art approaches with 70% fewer human interventions and completing a 2-kilometer mission in an unseen environment with just 1 intervention.
We introduce CREStE, a scalable learning-based mapless navigation framework to address the open-world generalization and robustness challenges of outdoor urban navigation. Key to achieving this is learning perceptual representations that generalize to open-set factors (e.g. novel semantic classes, terrains, dynamic entities) and inferring expert-aligned navigation costs from limited demonstrations. CREStE addresses both these issues, introducing 1) a visual foundation model (VFM) distillation objective for learning open-set structured bird's-eye-view perceptual representations, and 2) counterfactual inverse reinforcement learning (IRL), a novel active learning formulation that uses counterfactual trajectory demonstrations to reason about the most important cues when inferring navigation costs. We evaluate CREStE on the task of kilometer-scale mapless navigation in a variety of city, offroad, and residential environments and find that it outperforms all state-of-the-art approaches with 70% fewer human interventions, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. Videos and additional materials can be found on the project page: https://amrl.cs.utexas.edu/creste