LGAICLMar 3, 2025

Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models

arXiv:2503.01461v215 citationsh-index: 13Has CodeACL
Originality Incremental advance
AI Analysis

This addresses a critical problem for developers of efficient reasoning models by providing methods to enhance distillation, though it appears incremental as it builds on existing distillation techniques.

The paper tackles the bottleneck in distilling reasoning capabilities from large models to smaller ones, where long chain-of-thought data causes learning difficulties and biases like over-thinking. The result is a substantial improvement in reasoning performance on benchmarks such as GSM8K, MATH, AIME, Multi-IF, and Blocksworld by reducing hallucinations in long-time thinking.

Large Reasoning Models(LRMs) such as OpenAI o1 and DeepSeek-R1 have shown remarkable reasoning capabilities by scaling test-time compute and generating long Chain-of-Thought(CoT). Distillation--post-training on LRMs-generated data--is a straightforward yet effective method to enhance the reasoning abilities of smaller models, but faces a critical bottleneck: we found that distilled long CoT data poses learning difficulty for small models and leads to the inheritance of biases (i.e. over-thinking) when using Supervised Fine-tuning (SFT) and Reinforcement Learning (RL) methods. To alleviate this bottleneck, we propose constructing tree-based CoT data from scratch via Monte Carlo Tree Search(MCTS). We then exploit a set of CoT-aware approaches, including Thoughts Length Balance, Fine-grained DPO, and Joint Post-training Objective, to enhance SFT and RL on the constructed data. We conduct evaluation on various benchmarks such as math (GSM8K, MATH, AIME). instruction-following (Multi-IF) and planning (Blocksworld), results demonstrate our approaches substantially improve the reasoning performance of distilled models compared to standard distilled models via reducing the hallucinations in long-time thinking. The project homepage is https://github.com/AIDC-AI/Marco-o1.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes