CL AIMar 9

Can LLMs Perceive Time? An Empirical Investigation

arXiv:2604.000101 citations

Predicted impact top 9% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This addresses a practical limitation for AI agents in scheduling, planning, and time-critical scenarios, though it is incremental in highlighting a specific failure mode.

The paper investigates the inability of large language models to accurately estimate the duration of their own tasks, finding that pre-task estimates overshoot actual duration by 4-7 times, relative ordering scores at or below chance, and errors persist in multi-step settings with 5-10 times divergence.

Large language models cannot estimate how long their own tasks take. We investigate this limitation through four experiments across 68 tasks and four model families. Pre-task estimates overshoot actual duration by 4--7$\times$ ($p < 0.001$), with models predicting human-scale minutes for tasks completing in seconds. Relative ordering fares no better: on task pairs designed to expose heuristic reliance, models score at or below chance (GPT-5: 18\% on counter-intuitive pairs, $p = 0.033$), systematically failing when complexity labels mislead. Post-hoc recall is disconnected from reality -- estimates diverge from actuals by an order of magnitude in either direction. These failures persist in multi-step agentic settings, with errors of 5--10$\times$. The models possess propositional knowledge about duration from training but lack experiential grounding in their own inference time, with practical implications for agent scheduling, planning and time-critical scenarios.

View on arXiv PDF

Similar