Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints
This addresses the need for cost-effective data reduction in deep learning, though it is incremental as it builds on existing coreset selection techniques.
The paper tackles the problem of finding the smallest possible coreset that maintains model performance, proposing a method that optimizes both performance and size with a convergence guarantee, achieving better performance with smaller coreset sizes in experiments.
Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically performs on par with full data. Practitioners regularly desire to identify the smallest possible coreset in realistic scenes while maintaining comparable model performance, to minimize costs and maximize acceleration. Motivated by this desideratum, for the first time, we pose the problem of refined coreset selection, in which the minimal coreset size under model performance constraints is explored. Moreover, to address this problem, we propose an innovative method, which maintains optimization priority order over the model performance and coreset size, and efficiently optimizes them in the coreset selection procedure. Theoretically, we provide the convergence guarantee of the proposed method. Empirically, extensive experiments confirm its superiority compared with previous strategies, often yielding better model performance with smaller coreset sizes.