CLIP: Train Faster with Less Data
This work addresses the need for faster and more data-efficient training in machine learning, particularly for domain-specific applications like crowd density estimation, but it is incremental as it builds on existing data-centric approaches.
The paper tackles the problem of deep learning models requiring large datasets by proposing CLIP, a method combining curriculum learning and dataset pruning to improve training efficiency and accuracy, achieving reduced convergence time and better generalization in crowd density estimation experiments.
Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the quality of the data to improve the learning performance of the models rather than redesigning model architectures. In this paper, we propose CLIP i.e., Curriculum Learning with Iterative data Pruning. CLIP combines two data-centric approaches i.e., curriculum learning and dataset pruning to improve the model learning accuracy and convergence speed. The proposed scheme applies loss-aware dataset pruning to iteratively remove the least significant samples and progressively reduces the size of the effective dataset in the curriculum learning training. Extensive experiments performed on crowd density estimation models validate the notion behind combining the two approaches by reducing the convergence time and improving generalization. To our knowledge, the idea of data pruning as an embedded process in curriculum learning is novel.