CLApr 13, 2025

Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

Haotian Wang, Han Zhao, Shuaiting Chen, Xiaoyu Tian, Sitong Zhao, Yunjie Ji, Yiping Peng, Xiangang Li

arXiv:2504.09639v14.91 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This incremental approach addresses efficiency for deploying capable models with lower computational costs.

The paper tackles the problem of improving non-reasoning language models by using outputs from reasoning-intensive models, demonstrating consistent performance gains across benchmarks through supervised fine-tuning.

Recent advancements in large language models (LLMs), such as DeepSeek-R1 and OpenAI-o1, have demonstrated the significant effectiveness of test-time scaling, achieving substantial performance gains across various benchmarks. These advanced models utilize deliberate "thinking" steps to systematically enhance answer quality. In this paper, we propose leveraging these high-quality outputs generated by reasoning-intensive models to improve less computationally demanding, non-reasoning models. We explore and compare methodologies for utilizing the answers produced by reasoning models to train and improve non-reasoning models. Through straightforward Supervised Fine-Tuning (SFT) experiments on established benchmarks, we demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.

View on arXiv PDF Code

Similar