CVJul 16, 2024

PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation

arXiv:2407.11487v17 citationsh-index: 12Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of high computational cost in long-term planning for vision and language navigation agents, offering an incremental improvement over existing methods.

The paper tackles the computational inefficiency in vision and language navigation by proposing a method that uses directed fidelity trajectories for planning, achieving state-of-the-art performance on the RxR dataset and comparable results on R2R while significantly reducing computational cost.

Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction. Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning. However, they suffer from high computational cost when attempting to support such high-level predictions with GCN-like models. In this work, we propose an alternative method that facilitates navigation planning by considering the alignment between instructions and directed fidelity trajectories, which refers to a path from the initial node to the candidate locations on a directed graph without detours. This planning strategy leads to an efficient model while achieving strong performance. Specifically, we introduce a directed graph to illustrate the explored area of the environment, emphasizing directionality. Then, we firstly define the trajectory representation as a sequence of directed edge features, which are extracted from the panorama based on the corresponding orientation. Ultimately, we assess and compare the alignment between instruction and different trajectories during navigation to determine the next navigation target. Our method outperforms previous SOTA method BEVBert on RxR dataset and is comparable on R2R dataset while largely reducing the computational cost. Code is available: https://github.com/iSEE-Laboratory/VLN-PRET.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes