CVAIDec 2, 2022

CLIP: Train Faster with Less Data

arXiv:2212.01452v38 citationsh-index: 27
Originality Synthesis-oriented
AI Analysis

This work addresses the need for faster and more data-efficient training in machine learning, particularly for domain-specific applications like crowd density estimation, but it is incremental as it builds on existing data-centric approaches.

The paper tackles the problem of deep learning models requiring large datasets by proposing CLIP, a method combining curriculum learning and dataset pruning to improve training efficiency and accuracy, achieving reduced convergence time and better generalization in crowd density estimation experiments.

Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the quality of the data to improve the learning performance of the models rather than redesigning model architectures. In this paper, we propose CLIP i.e., Curriculum Learning with Iterative data Pruning. CLIP combines two data-centric approaches i.e., curriculum learning and dataset pruning to improve the model learning accuracy and convergence speed. The proposed scheme applies loss-aware dataset pruning to iteratively remove the least significant samples and progressively reduces the size of the effective dataset in the curriculum learning training. Extensive experiments performed on crowd density estimation models validate the notion behind combining the two approaches by reducing the convergence time and improving generalization. To our knowledge, the idea of data pruning as an embedded process in curriculum learning is novel.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes