LGAIMLSep 27, 2025

Data-Efficient Training by Evolved Sampling

arXiv:2509.23461v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses data efficiency for machine learning practitioners, offering a plug-and-play method that is incremental in improving training speed.

The paper tackles the problem of accelerating training by selecting informative data samples, proposing Evolved Sampling (ES) to reduce back propagation time by up to 45% while maintaining model performance.

Data selection is designed to accelerate learning with preserved performance. To achieve this, a fundamental thought is to identify informative data samples with significant contributions to the training. In this work, we propose \textbf{Evolved Sampling} (\textbf{ES}), a simple yet effective framework for \emph{dynamic} sampling along the training process. This method conducts \em batch \em level data selection based on the dynamics of losses and augmented \emph{loss differences}, which enables flexible \emph{frequency tuning}, and hence significantly reduces the back propagation time with maintained model performance. Due to its conciseness, ES is also readily extensible to incorporate \em set \em level data selection (to form ES with pruning, \textbf{ESWP}) for further accelerations. As a plug-and-play framework, ES(WP) consistently achieves lossless training accelerations across various pre-training and post-training tasks, saving up to nearly 45\% wall-clock time. Our results motivate further investigations on the data efficiency aspect of modern large-scale machine learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes