MLLGJun 11, 2019

ADASS: Adaptive Sample Selection for Training Acceleration

arXiv:1906.04819v21 citations
AI Analysis

This addresses training acceleration for machine learning practitioners, but it is incremental as it builds on existing optimization methods.

The paper tackles the inefficiency of using the full training set in each epoch of SGD variants by proposing ADASS, which adaptively selects subsets based on Lipschitz constants, achieving comparable accuracy while accelerating training in empirical tests on shallow and deep models.

Stochastic gradient decent~(SGD) and its variants, including some accelerated variants, have become popular for training in machine learning. However, in all existing SGD and its variants, the sample size in each iteration~(epoch) of training is the same as the size of the full training set. In this paper, we propose a new method, called \underline{ada}ptive \underline{s}ample \underline{s}election~(ADASS), for training acceleration. During different epoches of training, ADASS only need to visit different training subsets which are adaptively selected from the full training set according to the Lipschitz constants of the loss functions on samples. It means that in ADASS the sample size in each epoch of training can be smaller than the size of the full training set, by discarding some samples. ADASS can be seamlessly integrated with existing optimization methods, such as SGD and momentum SGD, for training acceleration. Theoretical results show that the learning accuracy of ADASS is comparable to that of counterparts with full training set. Furthermore, empirical results on both shallow models and deep models also show that ADASS can accelerate the training process of existing methods without sacrificing accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes