LGJul 24, 2022

A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues

arXiv:2207.11717v46 citationsh-index: 49
Originality Highly original
AI Analysis

This addresses the problem of cross-modal alignment and feature-level localization for artificial agents in outdoor VLN, representing a strong specific gain rather than a foundational advancement.

The paper tackles the challenge of boosting relevant features in Vision-and-Language Navigation (VLN) by introducing a priority map module inspired by neuropsychology, which integrates with a feature-location framework to double task completion rates and achieve state-of-the-art performance on the Touchdown benchmark.

In a busy city street, a pedestrian surrounded by distractions can pick out a single sign if it is relevant to their route. Artificial agents in outdoor Vision-and-Language Navigation (VLN) are also confronted with detecting supervisory signal on environment features and location in inputs. To boost the prominence of relevant features in transformer-based architectures without costly preprocessing and pretraining, we take inspiration from priority maps - a mechanism described in neuropsychological studies. We implement a novel priority map module and pretrain on auxiliary tasks using low-sample datasets with high-level representations of routes and environment-related references to urban features. A hierarchical process of trajectory planning - with subsequent parameterised visual boost filtering on visual inputs and prediction of corresponding textual spans - addresses the core challenges of cross-modal alignment and feature-level localisation. The priority map module is integrated into a feature-location framework that doubles the task completion rates of standalone transformers and attains state-of-the-art performance on the Touchdown benchmark for VLN. Code and data are referenced in Appendix C.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes