OCDSLGNEMLAug 29, 2017

Natasha 2: Faster Non-Convex Optimization Than SGD

arXiv:1708.08694v4258 citations
Originality Highly original
AI Analysis

This provides a faster optimization method for non-convex functions, which is incremental but offers concrete improvements in computational efficiency for machine learning practitioners.

The paper tackles the problem of training smooth neural networks to find approximate local minima, achieving a faster rate of O(ε^{-3.25}) backpropagations compared to the previous best of O(ε^{-4}) by SGD.

We design a stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using $O(\varepsilon^{-3.25})$ backpropagations. The best result was essentially $O(\varepsilon^{-4})$ by SGD. More broadly, it finds $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to stochastic gradients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes