Natasha 2: Faster Non-Convex Optimization Than SGD
This provides a faster optimization method for non-convex functions, which is incremental but offers concrete improvements in computational efficiency for machine learning practitioners.
The paper tackles the problem of training smooth neural networks to find approximate local minima, achieving a faster rate of O(ε^{-3.25}) backpropagations compared to the previous best of O(ε^{-4}) by SGD.
We design a stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using $O(\varepsilon^{-3.25})$ backpropagations. The best result was essentially $O(\varepsilon^{-4})$ by SGD. More broadly, it finds $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to stochastic gradients.