CVMar 18, 2024

Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation

arXiv:2403.11541v34 citationsh-index: 1Has CodeIEEE Robot Autom Lett
Originality Incremental advance
AI Analysis

This addresses navigation challenges for AI agents in vision-and-language tasks, but it appears incremental as it builds on existing VLN methods with specific enhancements.

The paper tackles the problem of inaccurate decisions in Vision-and-Language Navigation (VLN) due to limited reasoning, proposing a Hierarchical Spatial Proximity Reasoning method that improves navigation efficiency and accuracy, validated on datasets like REVERIE, SOON, R2R, and R4R.

Most Vision-and-Language Navigation (VLN) algorithms are prone to making inaccurate decisions due to their lack of visual common sense and limited reasoning capabilities. To address this issue, we propose a Hierarchical Spatial Proximity Reasoning (HSPR) method. First, we introduce a scene understanding auxiliary task to help the agent build a knowledge base of hierarchical spatial proximity. This task utilizes panoramic views and object features to identify types of nodes and uncover the adjacency relationships between nodes, objects, and between nodes and objects. Second, we propose a multi-step reasoning navigation algorithm based on the hierarchical spatial proximity knowledge base, which continuously plans feasible paths to enhance exploration efficiency. Third, we introduce a residual fusion method to improve navigation decision accuracy. Finally, we validate our approach with experiments on publicly available datasets including REVERIE, SOON, R2R, and R4R. Our code is available at https://github.com/iCityLab/HSPR

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes