OCDSLGNEMLMar 17, 2016

Variance Reduction for Faster Non-Convex Optimization

arXiv:1603.05643v2412 citations
AI Analysis

This work addresses a fundamental bottleneck in non-convex optimization for machine learning practitioners, offering the first theoretical improvement in convergence rates for this long-standing problem.

The paper tackles the problem of efficiently reaching a stationary point in non-convex optimization by introducing a first-order minibatch stochastic method that converges with an O(1/ε) rate, which is faster than full gradient descent by Ω(n^{1/3}) for objectives that are sums of smooth functions.

We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in $O(1/\varepsilon)$ iterations for smooth objectives, and stochastic gradient descent that converges in $O(1/\varepsilon^2)$ iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an $O(1/\varepsilon)$ rate, and is faster than full gradient descent by $Ω(n^{1/3})$. We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes