CLAIJan 29

TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning

arXiv:2601.21711v1h-index: 9
Originality Incremental advance
AI Analysis

This addresses the problem of computational cost and overthinking in reasoning tasks for AI researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the inefficiency and redundancy in long chain-of-thought reasoning for large language models by proposing TACLer, a tailored curriculum reinforcement learning framework that reduces training compute by over 50% and improves accuracy by over 9% on math datasets.

Large Language Models (LLMs) have shown remarkable performance on complex reasoning tasks, especially when equipped with long chain-of-thought (CoT) reasoning. However, eliciting long CoT typically requires large-scale reinforcement learning (RL) training, while often leading to overthinking with redundant intermediate steps. To improve learning and reasoning efficiency, while preserving or even enhancing performance, we propose TACLer, a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training. TACLer features two core components: (i) tailored curriculum learning that determines what knowledge the model lacks and needs to learn in progressive stages; (ii) a hybrid Thinking/NoThinking reasoning paradigm that balances accuracy and efficiency by enabling or disabling the Thinking mode. Our experiments show that TACLer yields a twofold advantage in learning and reasoning: (i) it reduces computational cost, cutting training compute by over 50% compared to long thinking models and reducing inference token usage by over 42% relative to the base model; and (ii) it improves accuracy by over 9% on the base model, consistently outperforming state-of-the-art Nothinking and Thinking baselines across four math datasets with complex problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes