LGAICLMay 22, 2025

AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners

arXiv:2505.16322v35 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses inefficiencies in training self-improving language models, offering a method to improve performance and reduce computational costs, though it is incremental as it builds on existing self-taught reasoner frameworks.

The paper tackled the problem of inefficient training in self-improving reasoning language models due to random data sampling, and introduced AdaSTaR, an adaptive sampling algorithm that achieved best test accuracy on all six benchmarks and reduced training FLOPs by an average of 58.6% compared to baselines.

Self-Taught Reasoners (STaR), synonymously known as Rejection sampling Fine-Tuning (RFT), is an integral part of the training pipeline of self-improving reasoning Language Models (LMs). The self-improving mechanism often employs random observation (data) sampling. However, this results in trained observation imbalance; inefficiently over-training on solved examples while under-training on challenging ones. In response, we introduce Adaptive STaR (AdaSTaR), a novel algorithm that rectifies this by integrating two adaptive sampling principles: (1) Adaptive Sampling for Diversity: promoting balanced training across observations, and (2) Adaptive Sampling for Curriculum: dynamically adjusting data difficulty to match the model's evolving strength. Across six benchmarks, AdaSTaR achieves best test accuracy in all instances (6/6) and reduces training FLOPs by an average of 58.6% against an extensive list of baselines. These improvements in performance and efficiency generalize to different pre-trained LMs and larger models, paving the way for more efficient and effective self-improving LMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes