AIMay 27

The Shape of Overthinking: Backtracking Bursts in Long Reasoning Traces

CMU

arXiv:2605.2796523.1h-index: 11

AI Analysis

For developers of reasoning models, this provides a deployable method to distinguish productive self-correction from unproductive revision in long traces.

The paper studies backtracking dynamics in long reasoning traces from Qwen3-8B on AIME problems, finding that early isolated repairs correlate with correct reasoning while persistent late backtracks indicate errors. A burst-aware early-exit policy outperforms fixed-length filtering for detecting instability.

Reasoning models often generate long traces in which useful self-correction and unproductive revision are hard to distinguish. We study this distinction through backtracking dynamics: local reconsideration, retraction, or re-derivation inside long-form reasoning traces. On 6{,}000 Qwen3-8B AIME traces, we annotate segment-level backtrack severity and analyze event timing, normalized depth, and local burst structure. We find that early isolated repair is often compatible with correct reasoning, whereas incorrect traces more often show moderate-to-severe backtracks that persist and cluster late. Cross-corpus checks show the same qualitative asymmetry across additional model/domain pairs. Filtering analyses instantiate the signal as a prefix-causal selective early-exit policy: at shallow and intermediate depths, burst-aware filtering outperforms fixed length-based filtering while using only prefix-available features. Moderate length cutoffs remain strong completed-trace baselines, but burst-aware control provides a deployable mechanism for separating recoverable repair from likely instability.

View on arXiv PDF

Similar