AIJan 9

Open-Vocabulary 3D Instruction Ambiguity Detection

arXiv:2601.05991v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

It addresses a critical safety gap in embodied AI by focusing on instruction ambiguity detection, which is incremental as it introduces a new task and benchmark but builds on existing vision-language models.

The paper tackles the problem of detecting ambiguous instructions in 3D scenes for safety-critical embodied AI, proposing a new task and benchmark (Ambi3D with 700 scenes and 22k instructions) and a method (AmbiVer) that improves performance, though state-of-the-art models still struggle.

In safety-critical domains, linguistic ambiguity can have severe consequences; a vague command like "Pass me the vial" in a surgical setting could lead to catastrophic errors. Yet, most embodied AI research overlooks this, assuming instructions are clear and focusing on execution rather than confirmation. To address this critical safety gap, we are the first to define Open-Vocabulary 3D Instruction Ambiguity Detection, a fundamental new task where a model must determine if a command has a single, unambiguous meaning within a given 3D scene. To support this research, we build Ambi3D, the large-scale benchmark for this task, featuring over 700 diverse 3D scenes and around 22k instructions. Our analysis reveals a surprising limitation: state-of-the-art 3D Large Language Models (LLMs) struggle to reliably determine if an instruction is ambiguous. To address this challenge, we propose AmbiVer, a two-stage framework that collects explicit visual evidence from multiple views and uses it to guide an vision-language model (VLM) in judging instruction ambiguity. Extensive experiments demonstrate the challenge of our task and the effectiveness of AmbiVer, paving the way for safer and more trustworthy embodied AI. Code and dataset available at https://jiayuding031020.github.io/ambi3d/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes