LGMLMay 28, 2023

Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality

arXiv:2305.17665v26 citations
Originality Incremental advance
AI Analysis

This work provides theoretical insights into SGDM's optimization properties, benefiting researchers and practitioners in machine learning and statistics by clarifying momentum's role and enabling better algorithm tuning and inference.

The paper analyzes stochastic gradient descent with momentum (SGDM) under strongly convex settings, showing that with large batch sizes, mini-batch SGDM converges faster than mini-batch SGD to a neighborhood of the optimum and allows broader learning rate choices. It also establishes asymptotic normality for a Polyak-averaged SGDM estimator, enabling uncertainty quantification and statistical inference.

Stochastic gradient descent with momentum (SGDM) has been widely used in many machine learning and statistical applications. Despite the observed empirical benefits of SGDM over traditional SGD, the theoretical understanding of the role of momentum for different learning rates in the optimization process remains widely open. We analyze the finite-sample convergence rate of SGDM under the strongly convex settings and show that, with a large batch size, the mini-batch SGDM converges faster than the mini-batch SGD to a neighborhood of the optimal value. Additionally, our findings, supported by theoretical analysis and numerical experiments, indicate that SGDM permits broader choices of learning rates. Furthermore, we analyze the Polyak-averaging version of the SGDM estimator, establish its asymptotic normality, and justify its asymptotic equivalence to the averaged SGD. The asymptotic distribution of the averaged SGDM enables uncertainty quantification of the algorithm output and statistical inference of the model parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes