CLSep 16, 2020

Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule

arXiv:2009.07783v330 citations
AI Analysis

This addresses the problem of improving navigation agents for robotics and AI assistants by introducing a generative alternative, though it is incremental as it builds on existing discriminative methods.

The paper tackles vision-and-language navigation by proposing a generative language-grounded policy, which outperforms discriminative approaches on Room-2-Room and Room-4-Room datasets, particularly in unseen environments, and combining both policies achieves near state-of-the-art results.

Vision-and-language navigation (VLN) is a task in which an agent is embodied in a realistic 3D environment and follows an instruction to reach the goal node. While most of the previous studies have built and investigated a discriminative approach, we notice that there are in fact two possible approaches to building such a VLN agent: discriminative \textit{and} generative. In this paper, we design and investigate a generative language-grounded policy which uses a language model to compute the distribution over all possible instructions i.e. all possible sequences of vocabulary tokens given action and the transition history. In experiments, we show that the proposed generative approach outperforms the discriminative approach in the Room-2-Room (R2R) and Room-4-Room (R4R) datasets, especially in the unseen environments. We further show that the combination of the generative and discriminative policies achieves close to the state-of-the art results in the R2R dataset, demonstrating that the generative and discriminative policies capture the different aspects of VLN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes