FastMCTS: A Simple Sampling Strategy for Data Synthesis
This addresses a bottleneck in data synthesis for enhancing large language models, offering a practical and efficient alternative for researchers and practitioners, though it is incremental as it builds on existing sampling methods.
The paper tackled the inefficiency and imbalance in synthetic multi-step reasoning data generation by introducing FastMCTS, a strategy inspired by Monte Carlo Tree Search, which generated over 30% more correct reasoning paths and improved model performance by 3.9% on benchmarks compared to rejection sampling.
Synthetic high-quality multi-step reasoning data can significantly enhance the performance of large language models on various tasks. However, most existing methods rely on rejection sampling, which generates trajectories independently and suffers from inefficiency and imbalanced sampling across problems of varying difficulty. In this work, we introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search. FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals and promoting balanced sampling across problems of different difficulty levels. Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30\% more correct reasoning paths compared to rejection sampling as the number of generated tokens scales up. Furthermore, under comparable synthetic data budgets, models trained on FastMCTS-generated data outperform those trained on rejection sampling data by 3.9\% across multiple benchmarks. As a lightweight sampling strategy, FastMCTS offers a practical and efficient alternative for synthesizing high-quality reasoning data. Our code will be released soon.