VLN-NF: Feasibility-Aware Vision-and-Language Navigation with False-Premise Instructions

Hung-Ting Su, Ting-Jun Wang, Jia-Fong Yeh, Min Sun, Winston H. Hsu

arXiv:2604.1053315.42 citationsh-index: 9

Predicted impact top 31% in RO · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the practical problem of handling unreliable instructions in VLN, which is an incremental step towards more robust embodied agents.

The paper introduces VLN-NF, a benchmark for Vision-and-Language Navigation with false-premise instructions where the target is absent, requiring agents to output NOT-FOUND. The proposed method ROAM achieves the best REV-SPL among compared methods, while baselines under-explore and terminate prematurely.

Conventional Vision-and-Language Navigation (VLN) benchmarks assume instructions are feasible and the referenced target exists, leaving agents ill-equipped to handle false-premise goals. We introduce VLN-NF, a benchmark with false-premise instructions where the target is absent from the specified room and agents must navigate, gather evidence through in-room exploration, and explicitly output NOT-FOUND. VLN-NF is constructed via a scalable pipeline that rewrites VLN instructions using an LLM and verifies target absence with a VLM, producing plausible yet factually incorrect goals. We further propose REV-SPL to jointly evaluate room reaching, exploration coverage, and decision correctness. To address this challenge, we present ROAM, a two-stage hybrid that combines supervised room-level navigation with LLM/VLM-driven in-room exploration guided by a free-space clearance prior. ROAM achieves the best REV-SPL among compared methods, while baselines often under-explore and terminate prematurely under unreliable instructions. VLN-NF project page can be found at https://vln-nf.github.io/.

View on arXiv PDF

Similar