LGOCMLOct 1, 2020

Momentum via Primal Averaging: Theoretical Insights and Learning Rate Schedules for Non-Convex Optimization

arXiv:2010.00406v429 citations
Originality Incremental advance
AI Analysis

This work offers theoretical insights for machine learning practitioners using momentum methods in non-convex optimization, though it is incremental as it builds on existing momentum techniques.

The paper tackled the problem of understanding when and why momentum methods outperform stochastic gradient descent (SGD) in non-convex optimization, such as for deep neural networks, by developing a tighter Lyapunov analysis that provides precise insights into hyper-parameter schedules.

Momentum methods are now used pervasively within the machine learning community for training non-convex models such as deep neural networks. Empirically, they out perform traditional stochastic gradient descent (SGD) approaches. In this work we develop a Lyapunov analysis of SGD with momentum (SGD+M), by utilizing a equivalent rewriting of the method known as the stochastic primal averaging (SPA) form. This analysis is much tighter than previous theory in the non-convex case, and due to this we are able to give precise insights into when SGD+M may out-perform SGD, and what hyper-parameter schedules will work and why.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes