CLApr 13, 2025

Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

arXiv:2504.09639v11 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This incremental approach addresses efficiency for deploying capable models with lower computational costs.

The paper tackles the problem of improving non-reasoning language models by using outputs from reasoning-intensive models, demonstrating consistent performance gains across benchmarks through supervised fine-tuning.

Recent advancements in large language models (LLMs), such as DeepSeek-R1 and OpenAI-o1, have demonstrated the significant effectiveness of test-time scaling, achieving substantial performance gains across various benchmarks. These advanced models utilize deliberate "thinking" steps to systematically enhance answer quality. In this paper, we propose leveraging these high-quality outputs generated by reasoning-intensive models to improve less computationally demanding, non-reasoning models. We explore and compare methodologies for utilizing the answers produced by reasoning models to train and improve non-reasoning models. Through straightforward Supervised Fine-Tuning (SFT) experiments on established benchmarks, we demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes