MLLGFeb 25, 2020

Statistical Adaptive Stochastic Gradient Methods

arXiv:2002.10597v113 citations
AI Analysis

This addresses the need for robust, autonomous learning rate scheduling in deep learning optimization, though it appears incremental relative to prior adaptive methods.

The authors tackled the problem of automatically scheduling learning rates in stochastic gradient methods by proposing SALSA, which combines a smoothed stochastic line-search for warm-up with a statistical test for decreasing rates. The method matched the performance of best hand-tuned schedules in deep learning experiments.

We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods. SALSA first uses a smoothed stochastic line-search procedure to gradually increase the learning rate, then automatically switches to a statistical method to decrease the learning rate. The line search procedure ``warms up'' the optimization process, reducing the need for expensive trial and error in setting an initial learning rate. The method for decreasing the learning rate is based on a new statistical test for detecting stationarity when using a constant step size. Unlike in prior work, our test applies to a broad class of stochastic gradient algorithms without modification. The combined method is highly robust and autonomous, and it matches the performance of the best hand-tuned learning rate schedules in our experiments on several deep learning tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes