AIMay 4

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Taewon Yun, Jisu Shin, Jeonghwan Choi, Seunghwan Bang, Hwanjun Song

arXiv:2605.0229097.4Has Code

Predicted impact top 2% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners needing efficient distillation of large reasoning models, CoRD reduces computational overhead while maintaining high-quality reasoning, though it is an incremental improvement over existing curation-based methods.

CoRD proposes a collaborative multi-teacher decoding framework that uses step-wise reasoning synthesis with perplexity-based scoring and beam search to distill Long-CoT reasoning, achieving near teacher-level student performance with fewer supervision signals.

Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at \href{https://github.com/DISL-Lab/CoRD}{https://github.com/DISL-Lab/CoRD}.

View on arXiv PDF Code

Similar