FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness
This work addresses the resource-intensive nature of model training for AI developers, enabling faster deployment and more frequent updates, though it is incremental as it builds on prior insights about test-time compute.
The paper tackles the problem of reducing training compute (FLOPs) for large language models by introducing TTC-aware training, which uses test-time compute to match or exceed the accuracy of fully trained models, achieving up to 92% reductions in training FLOPs while maintaining or improving accuracy.
Scaling training compute, measured in FLOPs, has long been shown to improve the accuracy of large language models, yet training remains resource-intensive. Prior work shows that increasing test-time compute (TTC)-for example through iterative sampling-can allow smaller models to rival or surpass much larger ones at lower overall cost. We introduce TTC-aware training, where an intermediate checkpoint and a corresponding TTC configuration can together match or exceed the accuracy of a fully trained model while requiring substantially fewer training FLOPs. Building on this insight, we propose an early stopping algorithm that jointly selects a checkpoint and TTC configuration to minimize training compute without sacrificing accuracy. To make this practical, we develop an efficient TTC evaluation method that avoids exhaustive search, and we formalize a break-even bound that identifies when increased inference compute compensates for reduced training compute. Experiments demonstrate up to 92\% reductions in training FLOPs while maintaining and sometimes remarkably improving accuracy. These results highlight a new perspective for balancing training and inference compute in model development, enabling faster deployment cycles and more frequent model refreshes. Codes will be publicly released.