LGOCSTMLJun 11, 2020

AdaS: Adaptive Scheduling of Stochastic Gradients

arXiv:2006.06587v19 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of learning rate scheduling for practitioners in deep learning, offering an automated method that reduces reliance on manual tuning, though it is incremental as it builds on existing adaptive techniques.

The paper tackles the problem of manually tuning learning rates in SGD by proposing AdaS, an adaptive scheduling algorithm that uses derived metrics from singular values to adjust the learning rate based on knowledge gain, resulting in faster convergence and better generalization without needing a validation set for early stopping.

The choice of step-size used in Stochastic Gradient Descent (SGD) optimization is empirically selected in most training procedures. Moreover, the use of scheduled learning techniques such as Step-Decaying, Cyclical-Learning, and Warmup to tune the step-size requires extensive practical experience--offering limited insight into how the parameters update--and is not consistent across applications. This work attempts to answer a question of interest to both researchers and practitioners, namely \textit{"how much knowledge is gained in iterative training of deep neural networks?"} Answering this question introduces two useful metrics derived from the singular values of the low-rank factorization of convolution layers in deep neural networks. We introduce the notions of \textit{"knowledge gain"} and \textit{"mapping condition"} and propose a new algorithm called Adaptive Scheduling (AdaS) that utilizes these derived metrics to adapt the SGD learning rate proportionally to the rate of change in knowledge gain over successive iterations. Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training. Code is available at \url{https://github.com/mahdihosseini/AdaS}.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes