CLAIOct 10, 2025

Mitigating Overthinking through Reasoning Shaping

Peking U
arXiv:2510.09535v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in reasoning models for AI applications, representing an incremental improvement over prior penalization methods.

The paper tackles the problem of overthinking in large reasoning models, where excessive reasoning inflates computational cost, by proposing Group Relative Segment Penalization (GRSP), a step-level method that achieves superior token efficiency without heavily compromising accuracy, especially on harder problems.

Large reasoning models (LRMs) boosted by Reinforcement Learning from Verifier Reward (RLVR) have shown great power in problem solving, yet they often cause overthinking: excessive, meandering reasoning that inflates computational cost. Prior designs of penalization in RLVR manage to reduce token consumption while often harming model performance, which arises from the oversimplicity of token-level supervision. In this paper, we argue that the granularity of supervision plays a crucial role in balancing efficiency and accuracy, and propose Group Relative Segment Penalization (GRSP), a step-level method to regularize reasoning. Since preliminary analyses show that reasoning segments are strongly correlated with token consumption and model performance, we design a length-aware weighting mechanism across segment clusters. Extensive experiments demonstrate that GRSP achieves superior token efficiency without heavily compromising accuracy, especially the advantages with harder problems. Moreover, GRSP stabilizes RL training and scales effectively across model sizes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes