CLMar 20

When Contextual Inference Fails: Cancelability in Interactive Instruction Following

arXiv:2603.1999728.3h-index: 18
AI Analysis

This addresses a challenge in human-AI interaction for collaborative tasks, but it is incremental as it builds on an existing psycholinguistic paradigm.

The paper tackles the problem of how large language models (LLMs) handle ambiguous instructions in interactive tasks, finding that while models can detect unreliable speakers in confidence ratings, they fail to use this information to guide efficient clarification behavior, leading to suboptimal strategies like over-clarification or guessing.

We investigate the separation of literal interpretation from contextual inference in a collaborative block-building task where a builder must resolve underspecified instructions using contextual inferences. Building on an existing two-speaker psycholinguistic paradigm -- which contrasts a pragmatically cooperative speaker with one who is only literally reliable -- we introduce Build What I Mean (BWIM), an interactive benchmark for contextual meaning construction. In BWIM, models must resolve ambiguity by either performing a contextual inference or requesting clarification at a small communication cost. Evaluating several state-of-the-art LLMs, we find a dissociation between judgment and action: while models detect speaker unreliability in explicit confidence ratings, they fail to exploit this information to guide efficient clarification behavior. Instead, we observe suboptimal strategies, such as partner-blind over-clarification and question-averse guessing under uncertainty.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes