CVCLROSep 30, 2021

Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments

arXiv:2109.15207v1683 citations
Originality Incremental advance
AI Analysis

This addresses a specific challenge in VLN for embodied agents, but appears incremental as it builds on prior supervision methods.

The paper tackles the problem of misalignment between goal-oriented supervision and language instructions in Vision-and-Language Navigation (VLN) for off-path scenarios, proposing a language-aligned supervision scheme and a new metric to measure sub-instruction completion.

In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference path. Prior work supervises the agent with actions based on the shortest path from the agent's location to the goal, but such goal-oriented supervision is often not in alignment with the instruction. Furthermore, the evaluation metrics employed by prior work do not measure how much of a language instruction the agent is able to follow. In this work, we propose a simple and effective language-aligned supervision scheme, and a new metric that measures the number of sub-instructions the agent has completed during navigation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes