CLJan 14

The Imperfective Paradox in Large Language Models

arXiv:2601.09373v10.6

Originality Highly original

AI Analysis

This work addresses a foundational issue in AI interpretability for researchers and developers, highlighting limitations in LLMs' logical reasoning, though it is incremental as it builds on existing diagnostic methods.

The study tackled the problem of whether Large Language Models (LLMs) understand event semantics by investigating the Imperfective Paradox, revealing that models exhibit a Teleological Bias, hallucinating event completions for goal-oriented events despite explicit negation, with prompting interventions reducing hallucinations but increasing incorrect rejections.

Do Large Language Models (LLMs) genuinely grasp the compositional semantics of events, or do they rely on surface-level probabilistic heuristics? We investigate the Imperfective Paradox, a logical phenomenon where the past progressive aspect entails event realization for activities (e.g., running $\to$ ran) but not for accomplishments (e.g., building $\nrightarrow$ built). We introduce ImperfectiveNLI, a diagnostic dataset designed to probe this distinction across diverse semantic classes. Evaluating state-of-the-art open-weight models, we uncover a pervasive Teleological Bias: models systematically hallucinate completion for goal-oriented events, often overriding explicit textual negation. Representational analyses show that while internal embeddings often distinguish process from result, inference decisions are dominated by strong priors about goal attainment. We further find that prompting-based interventions reduce hallucinated completions but also increase incorrect rejections of valid entailments. Our findings suggest that current LLMs lack structural aspectual awareness, operating as predictive narrative engines rather than faithful logical reasoners.

View on arXiv PDF

Similar