CV IR ROApr 4, 2025

RANa: Retrieval-Augmented Navigation

Gianluca Monaci, Rafael S. Rezende, Romain Deffayet, Gabriela Csurka, Guillaume Bono, Hervé Déjean, Stéphane Clinchant, Christian Wolf

arXiv:2504.03524v22 citationsh-index: 39Trans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This addresses the problem of inefficient episodic learning in robotic navigation by enabling memory reuse, though it appears incremental as it builds on existing retrieval-augmented and foundation model approaches.

The paper tackles the problem of navigation agents starting each episode with no memory of previous experiences by introducing a retrieval-augmented agent that queries a database from past episodes in the same environment, showing it enables zero-shot transfer across tasks and environments while significantly improving performance.

Methods for navigation based on large-scale learning typically treat each episode as a new problem, where the agent is spawned with a clean memory in an unknown environment. While these generalization capabilities to an unknown environment are extremely important, we claim that, in a realistic setting, an agent should have the capacity of exploiting information collected during earlier robot operations. We address this by introducing a new retrieval-augmented agent, trained with RL, capable of querying a database collected from previous episodes in the same environment and learning how to integrate this additional context information. We introduce a unique agent architecture for the general navigation task, evaluated on ImageNav, Instance-ImageNav and ObjectNav. Our retrieval and context encoding methods are data-driven and employ vision foundation models (FM) for both semantic and geometric understanding. We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance.

View on arXiv PDF

Similar