CLMay 19

MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models

Yuanqing Cai, Ziyi Huang, Minhao Liu, Lixin Duan, Wen Li, Yanru Zhang

arXiv:2605.2012819.2

AI Analysis

Identifies a fundamental cognitive limitation in LLMs for high-stakes decision-making, highlighting the need for more cognitively aligned models.

LLMs exhibit inattentional blindness, failing to attend to subtle contextual cues under explicit instructions. The best model (Gemini 2.5 Pro) achieves only 42.8% consistency on the MixRea benchmark of 2,246 questions.

Large language models (LLMs) are increasingly integrated into high-stakes decision-making. Inspired by the theory of \emph{inattentional blindness} in human cognition, we investigate whether LLMs, trained on human-preferred corpora that embed attentional biases, exhibit a similar limitation: \emph{failing to attend to subtle yet important contextual cues under explicit task instructions}. To evaluate this, we introduce the task of \textbf{explicit-implicit reasoning} and present \textbf{MixRea}, a benchmark of 2,246 multiple-choice questions across 9 reasoning types with varying distributions of explicit and implicit information. Evaluation of 21 advanced LLMs shows that even the best-performing reasoning model (Gemini 2.5 Pro) achieves only 42.8\% consistency, revealing widespread inattentional blindness. To mitigate this, we propose \textbf{Potential Relation Completion Prompting (PRCP)}, a prompting method that improves reasoning by recovering overlooked causal relations. Further analysis shows that this limitation persists across diverse multi-source reasoning tasks, highlighting the need for more cognitively aligned models.

View on arXiv PDF

Similar