Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers
For practitioners of hybrid reasoning models, this work provides a simple, training-free method to control reasoning budget and improve efficiency in both inference and RL training.
The paper identifies that reasoning behavior in hybrid language models is controlled by specific trigger tokens rather than high-level instructions, and proposes Mid-Think, a training-free prompting method that combines these triggers for intermediate-budget reasoning. Mid-Think improves accuracy-length trade-off over baselines and, when applied to RL training, reduces training time by ~15% while boosting Qwen3-8B's AIME accuracy from 69.8% to 72.4% and GPQA from 58.5% to 61.1%.
Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely driven by a small set of trigger tokens rather than the instructions themselves. Through attention analysis and controlled prompting experiments, we show that a leading ``Okay'' token induces reasoning behavior, while the newline pattern following ``</think>'' suppresses it. Based on this observation, we propose Mid-Think, a simple training-free prompting format that combines these triggers to achieve intermediate-budget reasoning, consistently outperforming fixed-token and prompt-based baselines in terms of the accuracy-length trade-off. Furthermore, applying Mid-Think to RL training after SFT reduces training time by approximately 15% while improving final performance of Qwen3-8B on AIME from 69.8% to 72.4% and on GPQA from 58.5% to 61.1%, demonstrating its effectiveness for both inference-time control and RL-based reasoning training.