OCCCLGFeb 1, 2019

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

arXiv:1902.00247v2115 citations
AI Analysis

This provides a theoretical advancement for researchers in optimization and machine learning, showing SGD can match accelerated methods without extra techniques, though it is incremental in refining existing analysis.

The paper tackles the problem of nonconvex stochastic gradient descent (SGD) escaping saddle points efficiently, proving that SGD achieves an approximate second-order stationary point in $ ilde{O}(\epsilon^{-3.5})$ stochastic gradient computations, which improves upon the classical belief of $O(\epsilon^{-4})$.

In this paper, we give a sharp analysis for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an $(ε, O(ε^{0.5}))$-approximate second-order stationary point in $\tilde{O}(ε^{-3.5})$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipschitz, and dispersive noise assumptions. This result subverts the classical belief that SGD requires at least $O(ε^{-4})$ stochastic gradient computations for obtaining an $(ε,O(ε^{0.5}))$-approximate second-order stationary point. Such SGD rate matches, up to a polylogarithmic factor of problem-dependent parameters, the rate of most accelerated nonconvex stochastic optimization algorithms that adopt additional techniques, such as Nesterov's momentum acceleration, negative curvature search, as well as quadratic and cubic regularization tricks. Our novel analysis gives new insights into nonconvex SGD and can be potentially generalized to a broad class of stochastic optimization algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes