AIFeb 23

Implicit Intelligence -- Evaluating Agents on What Users Don't Say

arXiv:2602.20424v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses the challenge of AI agents understanding unstated human needs in real-world interactions, which is incremental as it builds on existing evaluation methods by focusing on implicit constraints.

The paper tackles the problem of AI agents failing to infer implicit requirements in user requests, such as accessibility or privacy constraints, by introducing the Implicit Intelligence evaluation framework and Agent-as-a-World harness. The result shows that even the best-performing model achieves only a 48.3% pass rate across 205 scenarios, highlighting a significant gap in contextual reasoning.

Real-world requests to AI agents are fundamentally underspecified. Natural human communication relies on shared context and unstated constraints that speakers expect listeners to infer. Current agentic benchmarks test explicit instruction-following but fail to evaluate whether agents can reason about implicit requirements spanning accessibility needs, privacy boundaries, catastrophic risks, and contextual constraints. We present Implicit Intelligence, an evaluation framework testing whether AI agents can move beyond prompt-following to become genuine goal-fulfillers, paired with Agent-as-a-World (AaW), a harness where interactive worlds are defined in human-readable YAML files and simulated by language models. Our scenarios feature apparent simplicity in user requests, hidden complexity in correct solutions, and discoverability of constraints through environmental exploration. Evaluating 16 frontier and open-weight models across 205 scenarios, we find that even the best-performing model achieves only 48.3% scenario pass rate, revealing substantial room for improvement in bridging the gap between literal instruction-following and human-like contextual reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes