CLMay 9

Hint Tuning: Less Data Makes Better Reasoners

Siqi Fan, Minghao Li, Xiaoqian Ma, Xiusheng Huang, Zhuo Chen, Bowen Qin, Liujie Zhang, Shuo Shang, Weihang Chen

arXiv:2605.0866566.2

AI Analysis

For practitioners deploying large reasoning models, this method dramatically reduces computational cost without sacrificing accuracy, offering a data-efficient alternative to distillation or reinforcement learning.

Hint Tuning reduces token usage by 24-66% (31.5% average) across multiple reasoning models and scales while maintaining competitive accuracy on five benchmarks, using only 1K self-annotated samples.

Large reasoning models achieve high accuracy through extended chain-of-thought but generate 5--8 more tokens than necessary, applying verbose reasoning uniformly regardless of problem difficulty. We propose Hint Tuning, a data-efficient approach that teaches models to calibrate reasoning depth. Our key insight: the corresponding instruct model serves as an ideal difficulty probe. By testing what the instruct model can solve with varying guidance, we automatically construct training data across three states: No-Hint (direct answer), Sparse-Hint (minimal prefix), and Full-Hint (complete reasoning). This converts the abstract challenge of difficulty labeling into a measurable consistency check between the instruct and reasoning models. With only 1K self-annotated samples, Hint Tuning achieves 24--66% token reduction (31.5% average) across mainstream reasoning models (Qwen3-Thinking, DeepSeek-R1-Distill) at multiple scales (4B--32B) while maintaining competitive accuracy on five benchmarks. Unlike methods requiring massive distillation datasets or expensive RL, we achieve superior efficiency through simple alignment with the instruct model's capabilities.

View on arXiv PDF

Similar