OCLGMLOct 21, 2021

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

arXiv:2110.11442v319 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of tuning hyperparameters in optimization algorithms for machine learning practitioners, though it is incremental as it builds on existing SGD methods with specific adaptations.

The paper tackles the problem of making stochastic gradient descent (SGD) adaptive to noise and problem-dependent constants, achieving rates such as $ ilde{O} \left(\exp \left( rac{-T}{\kappa} ight) + rac{\sigma^2}{T} ight)$ for SGD and $ ilde{O} \left(\exp \left( rac{-T}{\sqrt{\kappa}} ight) + rac{\sigma^2}{T} ight)$ for accelerated SGD without prior knowledge of noise.

We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $σ^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $κ$, we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an $\tilde{O} \left(\exp \left( \frac{-T}κ \right) + \frac{σ^2}{T} \right)$ rate, without knowing $σ^2$. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD with SLS converges at the desired rate, but only to a neighbourhood of the solution. On the other hand, we prove that SGD with an offline estimate of the smoothness converges to the minimizer. However, its rate is slowed down proportional to the estimation error. Next, we prove that SGD with Nesterov acceleration and exponential step-sizes (referred to as ASGD) can achieve the near-optimal $\tilde{O} \left(\exp \left( \frac{-T}{\sqrtκ} \right) + \frac{σ^2}{T} \right)$ rate, without knowledge of $σ^2$. When used with offline estimates of the smoothness and strong-convexity, ASGD still converges to the solution, albeit at a slower rate. We empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes