Reasoning Models Don't Just Think Longer, They Move Differently

Anders Gjølbye, Lars Kai Hansen, Sanmi Koyejo

arXiv:2605.1545427.8

Predicted impact top 56% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a methodological correction (length residualization) for analyzing hidden-state trajectories during reasoning, revealing that reasoning training can alter internal dynamics in a domain-dependent manner, which is important for understanding how reasoning emerges in LLMs.

The study investigates whether reasoning-trained language models follow different internal trajectories during chain-of-thought generation, beyond just generating more tokens. After correcting for generation length, they find that reasoning models exhibit distinct trajectory geometry, particularly in code domains, with stronger coupling to problem difficulty compared to instruction-tuned baselines.

Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer generations mechanically alter path statistics, so difficulty-dependent comparisons are misleading without adjustment. After residualizing trajectory statistics on length, difficulty remains systematically coupled to corrected trajectory geometry across all domains studied. The clearest reasoning-specific separation appears in the code domain, where harder problems show more direct corrected trajectories and less heterogeneous local curvature in reasoning-trained models than in matched instruction-tuned baselines. Corrected difficulty-geometry coupling is weaker, but still present, in mathematics and Boolean satisfiability. Prompt-stage linear probes do not mirror the code-domain separation, and behavioral annotations show that stronger corrected coupling co-occurs with strategy shifts and uncertainty monitoring. Together, these findings establish length correction as a prerequisite for generation-time trajectory analysis and show that reasoning training can be associated with distinct corrected trajectory geometry, with the strength of the effect depending on the domain.

View on arXiv PDF

Similar