AIMay 27, 2025

Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Sohyun An, Ruochen Wang, Tianyi Zhou, Cho-Jui Hsieh

arXiv:2505.21765v121.312 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses efficiency and accuracy issues in large reasoning models for mathematical reasoning tasks, representing an incremental improvement over existing reinforcement learning approaches.

The paper tackles the problem of large reasoning models generating unnecessarily long reasoning paths that waste computation, by proposing a dynamic optimization framework that segments and optimizes thinking patterns. The result is up to 47% reduction in attention FLOPs while maintaining accuracy, with a 15.6% accuracy improvement for originally incorrect responses and up to 12% overall accuracy gain while reducing token usage from about 5,000 to 3,000 tokens.

While recent success of large reasoning models (LRMs) significantly advanced LLMs' reasoning capability by optimizing the final answer accuracy using reinforcement learning, they may also drastically increase the output length due to overthinking, characterized by unnecessarily complex reasoning paths that waste computation and potentially degrade the performance. We hypothesize that such inefficiencies stem from LRMs' limited capability to dynamically select the proper modular reasoning strategies, termed thinking patterns at the right position. To investigate this hypothesis, we propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns, systematically identifying and promoting beneficial patterns that improve the answer while removing detrimental ones. Empirical analysis confirms that our optimized thinking paths yield more concise yet sufficiently informative trajectories, enhancing reasoning efficiency by reducing attention FLOPs by up to 47% while maintaining accuracy for originally correct responses. Moreover, a non-trivial portion of originally incorrect responses are transformed into correct ones, achieving a 15.6% accuracy improvement with reduced length. Motivated by the improvement brought by the optimized thinking paths, we apply a preference optimization technique supported by a pairwise dataset contrasting suboptimal and optimal reasoning paths. Experimental evaluations across multiple mathematical reasoning benchmarks reveal that our method notably reduces computational overhead while simultaneously improving reasoning accuracy, achieving up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens.

View on arXiv PDF

Similar