AIFeb 19

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

arXiv:2602.17066v1h-index: 4

Originality Incremental advance

AI Analysis

This addresses the problem of slow training convergence for language model developers, representing an incremental optimization of existing curriculum learning approaches.

The paper tackles the problem of accelerating language model training by introducing Predictive Batch Scheduling (PBS), which prioritizes high-loss samples during batch construction using a lightweight linear predictor. The result is 6-13% faster convergence in experiments with a 130M parameter transformer.

We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning approaches that require predefined difficulty metrics or hard example mining methods that demand expensive per-sample loss tracking, PBS employs a lightweight linear predictor trained online to estimate sample difficulty from static token-level features. Our predictor achieves 0.44 correlation with actual loss using only four simple features: token frequency, sequence length, vocabulary diversity, and rare token ratio. Experiments on a 130M parameter transformer demonstrate that PBS achieves 6-13\% faster convergence measured by evaluation loss across training checkpoints, with the predictor's correlation improving from 0.14 to 0.44 over 10,000 training steps. These results validate that token frequency statistics encode meaningful information about sample difficulty, enabling effective curriculum learning with negligible computational overhead.

View on arXiv PDF

Similar