ROAICLCVDec 9, 2020

Topological Planning with Transformers for Vision-and-Language Navigation

arXiv:2012.05292v1163 citations
AI Analysis

This work addresses the challenge of vision-and-language navigation in complex, freely traversable environments for autonomous agents, offering an incremental improvement over existing end-to-end methods.

The paper proposes a modular approach for vision-and-language navigation (VLN) in freely traversable environments, using topological maps and attention mechanisms to predict navigation plans. This method outperforms previous end-to-end approaches, generating interpretable plans and demonstrating intelligent behaviors like backtracking.

Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes