LGMay 16

Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road

Ngoc-Hieu Nguyen, Parshin Shojaee, Phuc Minh Nguyen, Nan Zhang, Chandan K Reddy, Khoa D Doan, Rui Zhang

arXiv:2605.1702681.7

AI Analysis

For researchers and practitioners working on post-training of reasoning models, this paper identifies a data-centric cause of coverage shrinkage and offers mitigation strategies.

The paper investigates why reasoning models lose coverage (pass@k degrades) after SFT-based post-training, identifying decision-point scenarios in training data as the key driver. They show that targeted data synthesis and diversity-encouraging decoding can partially mitigate this shrinkage.

Recent progress in large language models has led to the emergence of reasoning models, which have shown strong performance on complex tasks through specialized fine-tuning procedures. While these methods reliably improve pass@1 accuracy, prior works have observed that they show a coverage shrinkage behavior, where pass@k degrades relative to the base model. In this paper, we investigate the reasoning shrinkage arise under SFT-based post-training. We hypothesize that this behavior is driven by properties of the fine-tuning data, specifically related to decision points or "forks in the road" scenarios where model faces indecipherable patterns with multiple valid reasoning paths. To test this hypothesis, we design controlled case studies that simulate such decision-point settings, spanning indecipherable nodes in graph branching, and reasoning modes. By tracking post-training dynamics in these settings, we find that the shrinkage phenomenon is tightly correlated with the prevalence of decision-point scenarios in the training data. We also demonstrate that this shrinkage behavior can be partially mitigated through targeted data synthesis design of decision-points, and a more systematic diversity-encouraging decoding mechanism. Our findings identify data-centric factors as a key driver of shrinkage in reasoning models and highlight diversity-aware designs as an effective lever for controlling it.

View on arXiv PDF

Similar