LGCLCVJun 3, 2021

Grounding Complex Navigational Instructions Using Scene Graphs

arXiv:2106.01607v1
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of providing better supervision for training agents in natural language navigation, which is incremental as it builds on existing datasets and methods.

The authors tackled the problem of limited supervision for training reinforcement learning agents to follow natural language instructions by adapting the CLEVR dataset to generate complex navigation instructions and scene graphs, resulting in an environment-agnostic supervised dataset. They demonstrated its use by mapping scenes to VizDoom and training an agent with a gated attention architecture to carry out these instructions.

Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i.e. knowing when the instruction has been carried out. We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset. To demonstrate the use of this data set, we map the scenes to the VizDoom environment and use the architecture in \citet{gatedattention} to train an agent to carry out these more complex language instructions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes