CLSep 5, 2025

Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation

arXiv:2509.05226v1h-index: 10
Originality Incremental advance
AI Analysis

This work addresses inefficiency in reasoning for AI models, particularly in math tasks, by enabling proportional thinking without architectural changes, though it is incremental as it builds on existing chain-of-thought and fine-tuning methods.

The paper tackles the problem of unnecessarily verbose chain-of-thought reasoning by introducing a difficulty-aware framework that teaches models to dynamically adjust reasoning depth based on problem complexity, resulting in reduced reasoning length while maintaining or improving performance.

Chain-of-thought reasoning, while powerful, can produce unnecessarily verbose output for simpler problems. We present a framework for difficulty-aware reasoning that teaches models to dynamically adjust reasoning depth based on problem complexity. Remarkably, we show that models can be endowed with such dynamic inference pathways without any architectural modifications; we simply post-train on data that is carefully curated to include chain-of-thought traces that are proportional in length to problem difficulty. Our analysis reveals that post-training via supervised fine-tuning (SFT) primarily captures patterns like reasoning length and format, while direct preference optimization (DPO) preserves reasoning accuracy, with their combination reducing length and maintaining or improving performance. Both quantitative metrics and qualitative assessments confirm that models can learn to "think proportionally", reasoning minimally on simple problems while maintaining depth for complex ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes