Towards Navigation by Reasoning over Spatial Configurations
This work addresses navigation for agents using natural language instructions, but it is incremental as it builds on existing methods with a focus on spatial configurations.
The paper tackles the navigation problem where an agent follows natural language instructions by emphasizing spatial semantics to ground instructions into visual perceptions, resulting in improved performance over strong baselines in seen environments and competitive results in unseen ones.
We deal with the navigation problem where the agent follows natural language instructions while observing the environment. Focusing on language understanding, we show the importance of spatial semantics in grounding navigation instructions into visual perceptions. We propose a neural agent that uses the elements of spatial configurations and investigate their influence on the navigation agent's reasoning ability. Moreover, we model the sequential execution order and align visual objects with spatial configurations in the instruction. Our neural agent improves strong baselines on the seen environments and shows competitive performance on the unseen environments. Additionally, the experimental results demonstrate that explicit modeling of spatial semantic elements in the instructions can improve the grounding and spatial reasoning of the model.