CL AIJun 1

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun, Xiangzheng Zhang, Tong Yang

arXiv:2606.0211367.41 citations

AI Analysis

For researchers and practitioners working on post-training of large reasoning models, this paper offers a structured overview of a scattered literature, but it is a survey, not a novel contribution.

This primer synthesizes over 150 studies on post-training reasoning data, organizing the field around four key questions to provide an attribution framework for future work.

Post-training has become a primary driver of recent progress in large reasoning models, and reasoning data are often the key variable determining whether this stage succeeds. Work on post-training reasoning data has grown rapidly, yet this literature remains scattered across dataset papers, reinforcement-learning recipes, reward-model studies, benchmarks, and frontier system reports. This paper is the first primer to synthesize over 150 key public studies and system reports on post-training reasoning data. We organize the field around four questions: what data objects exist, what makes them useful, how they are constructed, and how they scale. Together, this organization provides an attribution framework for future reasoning-data releases and post-training recipes.

View on arXiv PDF

Similar