CLOct 11, 2025

Stop When Enough: Adaptive Early-Stopping for Chain-of-Thought Reasoning

Renliang Sun, Wei Cheng, Dawei Li, Haifeng Chen, Wei Wang

arXiv:2510.10103v116.39 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses a practical efficiency issue for users of large language models in reasoning tasks, though it is incremental as it builds on existing CoT methods.

The paper tackles the problem of overthinking in Chain-of-Thought reasoning for large language models, which increases inference costs and can lead to errors, by introducing REFRAIN, a training-free framework that adaptively stops reasoning; it reduces token usage by 20-55% while maintaining or improving accuracy across benchmarks.

Chain-of-Thought (CoT) reasoning has driven recent gains of large language models (LLMs) on reasoning-intensive tasks by externalizing intermediate steps. However, excessive or redundant reasoning -- so-called overthinking -- can increase inference costs and lead LLMs toward incorrect conclusions. In this paper, we present REFRAIN ($\underline{REF}$lective-$\underline{R}$edundancy for $\underline{A}$daptive $\underline{IN}$ference), a training-free framework that adaptively determines when to stop reasoning to mitigate overthinking. REFRAIN integrates a two-stage stop discriminator to identify reflective yet redundant reasoning and a sliding-window Upper Confidence Bound (SW-UCB) multi-armed bandit controller to dynamically adjust stopping thresholds according to problem difficulty without supervision or fine-tuning. Across four representative benchmarks and two model families, REFRAIN reduces token usage by 20-55% while maintaining or improving accuracy compared to standard CoT prompting. Extensive ablation and robustness analyses demonstrate its stability across models, scorers, and prompt variations. In summary, our findings highlight when-to-stop as a new and practical axis of test-time scaling -- enabling models to reason not just more, but just enough.

View on arXiv PDF

Similar