OCLGMLAug 28, 2020

ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm

arXiv:2008.12690v411 citations
AI Analysis

This work provides a near-optimal algorithm for stochastic optimization, addressing both finite-sample and asymptotic efficiency for researchers and practitioners in machine learning and statistics.

The authors tackled the problem of strongly convex and smooth unconstrained optimization with stochastic first-order algorithms by proposing ROOT-SGD, which achieves state-of-the-art performance with optimal statistical risk and a sharp higher-order term scaling at O(n^{-3/2}) in nonasymptotic settings, and asymptotically converges to a Gaussian limit with Cramér-Rao optimal covariance.

We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as Recursive One-Over-T SGD (ROOT-SGD), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the non-asymptotic side, we prove risk bounds on the last iterate of ROOT-SGD with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of $O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD converges asymptotically to a Gaussian limit with the Cramér-Rao optimal asymptotic covariance, for a broad range of step-size choices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes