LGOCAug 3, 2022

SGEM: stochastic gradient with energy and momentum

arXiv:2208.02208v14 citationsh-index: 35
Originality Incremental advance
AI Analysis

This is an incremental improvement for machine learning practitioners seeking efficient optimization algorithms in non-convex settings.

The authors tackled the problem of non-convex stochastic optimization by proposing SGEM, which combines energy and momentum, resulting in faster convergence than AEGD and comparable or better generalization than SGDM in deep neural network training.

In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class of general non-convex stochastic optimization problems, based on the AEGD method that originated in the work [AEGD: Adaptive Gradient Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and momentum at the same time so as to inherit their dual advantages. We show that SGEM features an unconditional energy stability property, and derive energy-dependent convergence rates in the general nonconvex stochastic setting, as well as a regret bound in the online convex setting. A lower threshold for the energy variable is also provided. Our experimental results show that SGEM converges faster than AEGD and generalizes better or at least as well as SGDM in training some deep neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes