Adaptive Overclocking: Dynamic Control of Thinking Path Length via Real-Time Reasoning Signals
This addresses efficiency issues for users of large reasoning models, though it is incremental as it builds on existing overclocking methods.
The paper tackled computational inefficiency in Large Reasoning Models due to overthinking by proposing Adaptive Overclocking, which dynamically adjusts reasoning speed using real-time signals, achieving superior accuracy-latency trade-offs on benchmarks like GSM8K, MATH, and SVAMP.
Large Reasoning Models (LRMs) often suffer from computational inefficiency due to overthinking, where a fixed reasoning budget fails to match the varying complexity of tasks. To address this issue, we propose Adaptive Overclocking, a method that makes the overclocking hyperparameter $α$ dynamic and context-aware. Our method adjusts reasoning speed in real time through two complementary signals: (1) token-level model uncertainty for fine-grained step-wise control, and (2) input complexity estimation for informed initialization. We implement this approach with three strategies: Uncertainty-Aware Alpha Scheduling (UA-$α$S), Complexity-Guided Alpha Initialization (CG-$α$I), and a Hybrid Adaptive Control (HAC) that combines both. Experiments on GSM8K, MATH, and SVAMP show that HAC achieves superior accuracy-latency trade-offs, reducing unnecessary computation on simple problems while allocating more resources to challenging ones. By mitigating overthinking, Adaptive Overclocking enhances both efficiency and overall reasoning performance.