DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models
This addresses the problem of computational inefficiency in LLMs for users needing faster reasoning, though it is incremental as it builds on existing adaptive reasoning methods.
The paper tackles the inefficiency of chain-of-thought reasoning in large language models by proposing DART, a difficulty-adaptive truncation framework that adjusts thinking length based on problem difficulty, achieving 81.2% reasoning truncation and 5.33x computational acceleration on the GSM8K dataset while preserving or improving accuracy.
Adaptive reasoning is essential for aligning the computational effort of large language models (LLMs) with the intrinsic difficulty of problems. Current chain-of-thought methods boost reasoning ability but indiscriminately generate long explanations, leading to evident inefficiency. However, existing reinforcement learning approaches to adaptive thinking remain unstable and heavily reward-dependent. Here we propose \textbf{DART}, a supervised \textbf{D}ifficulty-\textbf{A}daptive \textbf{R}easoning \textbf{T}runcation framework that adjusts thinking length according to problem difficulty. By distilling concise reasoning patterns from stronger models, interpolating them into a continuum of reasoning styles, and curating optimal training data that balances correctness and compactness, DART learns when to ``stop thinking''. Across multiple mathematical benchmarks, experimental results demonstrate its remarkable efficiency while preserving or improving accuracy, achieving a significant 81.2\% reasoning truncation (DeepSeek-R1-Distill-Qwen-7B on GSM8K dataset) with 5.33$\times$ computational acceleration. DART provides a stable and general paradigm for efficient reasoning, advancing the development of adaptive intelligence in LLMs.