OCLGMLNov 11, 2021

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

arXiv:2111.06171v511 citations
Originality Incremental advance
AI Analysis

This work addresses a gap in optimization theory by analyzing momentum in stochastic proximal point methods, which is incremental but could benefit researchers and practitioners in machine learning and optimization.

The paper tackles the interaction of momentum with stochastic proximal point methods, showing that the Stochastic Proximal Point Algorithm with Momentum (SPPAM) achieves faster linear convergence to a neighborhood with a better contraction factor than SPPA under proper tuning, and offers improved stability over SGDM by allowing a wider range of hyperparameters for convergence.

Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence to a neighborhood compared to the stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes