CLJun 1, 2025

Trick or Neat: Adversarial Ambiguity and Language Model Evaluation

arXiv:2506.01205v12 citationsh-index: 7Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the challenge of ambiguity detection in language understanding for AI researchers, though it is incremental as it builds on existing probing methods.

The paper tackled the problem of evaluating language models' sensitivity to ambiguity by introducing an adversarial ambiguity dataset with syntactic, lexical, and phonological variations, finding that linear probes on model representations can decode ambiguity with high accuracy, sometimes exceeding 90%, while direct prompting fails.

Detecting ambiguity is important for language understanding, including uncertainty estimation, humour detection, and processing garden path sentences. We assess language models' sensitivity to ambiguity by introducing an adversarial ambiguity dataset that includes syntactic, lexical, and phonological ambiguities along with adversarial variations (e.g., word-order changes, synonym replacements, and random-based alterations). Our findings show that direct prompting fails to robustly identify ambiguity, while linear probes trained on model representations can decode ambiguity with high accuracy, sometimes exceeding 90\%. Our results offer insights into the prompting paradigm and how language models encode ambiguity at different layers. We release both our code and data: https://github.com/coastalcph/lm_ambiguity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes