CLCVLGNEROMar 6, 2019

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

arXiv:1903.02547v2185 citations
AI Analysis

This work addresses navigation efficiency for agents in unseen environments, representing an incremental improvement over existing methods.

The paper tackles the Vision-and-Language Navigation problem by introducing the FAST Navigator framework, which balances local and global signals to enable backtracking, achieving a 17% relative gain and 6% absolute gain in SPL on the R2R benchmark.

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et. al. (2018). Given a natural language instruction and photo-realistic image views of a previously unseen environment, the agent was tasked with navigating from source to target location as quickly as possible. While all current approaches make local action decisions or score entire trajectories using beam search, ours balances local and global signals when exploring an unobserved environment. Importantly, this lets us act greedily but use global signals to backtrack when necessary. Applying FAST framework to existing state-of-the-art models achieved a 17% relative gain, an absolute 6% gain on Success rate weighted by Path Length (SPL).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes