CLLGMay 29

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

arXiv:2605.3098167.5h-index: 2
AI Analysis

This work provides a principled, real-time diagnostic tool for practitioners to monitor and detect performance degradation in production LLM systems, addressing a critical reliability issue.

This paper formalizes "cognitive fatigue" in autoregressive language models, a degradation during long-horizon generation leading to repetitive text and loss of instruction adherence. They introduce the Fatigue Index (FI), a lightweight, model-agnostic diagnostic that predicts task degradation with an AUROC of 0.95 and repetition with a Spearman rho of 0.94 across nine models (1B-13B parameters).

Autoregressive language models frequently degrade during long-horizon generation, producing repetitive text, losing instruction adherence, and exhibiting unstable entropy. Despite the prevalence of these failures, practitioners lack online diagnostics to detect them in real-time as they occur. We formalize this degradation as cognitive fatigue, a measurable generation-time state characterized by decay in attention to the original prompt, representational drift, and entropy miscalibration. We introduce the Fatigue Index (FI), a lightweight, model-agnostic diagnostic that aggregates these three signals under explicit axioms (monotonicity, boundedness, interpretability) enabling reliable runtime monitoring. Across nine models (1B-13B parameters), FI trajectories exhibit structured temporal dynamics, predict task degradation (AUROC = 0.95) and repetition (Spearman rho = 0.94), and reveal non-monotonic scaling behavior: instruction-tuned models below 3B exhibit faster collapse than base models, with this trend reversing at 7B. Stress analyses further show that FI onset accelerates under longer contexts, middle-positioned evidence, and reduced numerical precision. These results establish cognitive fatigue as a coherent and measurable phenomenon, and position FI as a principled tool for runtime reliability monitoring in production LLM systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes