Think When You Need: Self-Adaptive Chain-of-Thought Learning
This addresses inefficiency in language model reasoning for AI applications, though it is incremental as it builds on existing CoT methods.
The paper tackles the problem of inefficient 'overthinking' in Chain-of-Thought reasoning by developing a method that adaptively adjusts reasoning length based on problem complexity, resulting in maintained accuracy with significantly more concise explanations across multiple benchmarks.
Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems. We identify that existing approaches directly penalizing reasoning length fail to account for varying problem complexity. Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness. Moreover, we further demonstrate our method to fuzzy tasks where ground truth is unavailable. Experiments across multiple reasoning benchmarks demonstrate that our method maintains accuracy while generating significantly more concise explanations, effectively teaching models to "think when needed."