Revisiting the UID Hypothesis in LLM Reasoning Traces
This challenges assumptions about machine reasoning and suggests new directions for interpretable models, but is incremental as it builds on existing UID hypothesis and reasoning analysis.
The study tackled the problem of unfaithful or hard-to-interpret reasoning traces in large language models by analyzing information flow using entropy-based metrics, finding that successful reasoning is globally non-uniform with uneven information density swings, contrasting human patterns.
Large language models (LLMs) often solve problems using step-by-step Chain-of-Thought (CoT) reasoning, yet these intermediate steps are frequently unfaithful or hard to interpret. Inspired by the Uniform Information Density (UID) hypothesis in psycholinguistics -- which posits that humans communicate by maintaining a stable flow of information -- we introduce entropy-based metrics to analyze the information flow within reasoning traces. Surprisingly, across three challenging mathematical benchmarks, we find that successful reasoning in LLMs is globally non-uniform: correct solutions are characterized by uneven swings in information density, in stark contrast to human communication patterns. This result challenges assumptions about machine reasoning and suggests new directions for designing interpretable and adaptive reasoning models.