LG AIMar 6, 2025

DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models

Yi Shen, Jian Zhang, Jieyun Huang, Shuming Shi, Wenjing Zhang, Jiangze Yan, Ning Wang, Kai Wang, Zhaoxiang Liu, Shiguo Lian

arXiv:2503.04472v247.1143 citationsh-index: 5Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses computational inefficiency in large reasoning models for AI researchers and practitioners, though it is an incremental improvement over existing slow-thinking methods.

The paper tackles the problem of overthinking in slow-thinking reasoning models, where models generate redundant reasoning steps for simple problems, and introduces DAST, a framework that adaptively adjusts reasoning length based on problem difficulty, reducing token usage by over 30% on average while maintaining accuracy on complex tasks.

Recent advancements in slow thinking reasoning models have shown exceptional performance in complex reasoning tasks. However, these models often exhibit overthinking (generating redundant reasoning steps for simple problems), leading to excessive computational resource usage. While current mitigation strategies uniformly reduce reasoning tokens, they risk degrading performance on challenging tasks that require extended reasoning. This paper introduces Difficulty-Adaptive Slow Thinking (DAST), a novel framework that enables models to autonomously adjust the length of Chain-of-Thought (CoT) based on problem difficulty. We first propose a Token Length Budget (TLB) metric to quantify difficulty, then leverage budget-aware reward shaping and budget preference optimization to implement DAST. DAST penalizes overlong responses for simple tasks while incentivizing sufficient reasoning for complex problems. Experiments on diverse datasets and model scales demonstrate that DAST effectively mitigates overthinking (reducing token usage by over 30\% on average) while preserving reasoning accuracy on complex problems. Our codes and models are available at https://github.com/AnonymousUser0520/AnonymousRepo01.

View on arXiv PDF Code

Similar