Convergence Rate of the Last Iterate of Stochastic Proximal Algorithms

Kevin Kurian Thomas Vaidyan, Michael P. Friedlander, Ahmet Alacaoglu

arXiv:2602.05489v12.5h-index: 14

Originality Incremental advance

AI Analysis

This work provides a more general theoretical understanding of convergence rates for stochastic proximal algorithms, which is important for researchers and practitioners working with large-scale optimization problems, especially in multi-task and federated learning.

This paper analyzes the convergence rate of the last iterate for two stochastic proximal algorithms: proximal stochastic gradient method and randomized incremental proximal method. The authors prove an optimal $\widetilde{O}(1/\sqrt{T})$ convergence rate for both algorithms under componentwise convexity and smoothness, relaxing the common bounded variance assumption.

We analyze two classical algorithms for solving additively composite convex optimization problems where the objective is the sum of a smooth term and a nonsmooth regularizer: proximal stochastic gradient method for a single regularizer; and the randomized incremental proximal method, which uses the proximal operator of a randomly selected function when the regularizer is given as the sum of many nonsmooth functions. We focus on relaxing the bounded variance assumption that is common, yet stringent, for getting last iterate convergence rates. We prove the $\widetilde{O}(1/\sqrt{T})$ rate of convergence for the last iterate of both algorithms under componentwise convexity and smoothness, which is optimal up to log terms. Our results apply directly to graph-guided regularizers that arise in multi-task and federated learning, where the regularizer decomposes as a sum over edges of a collaboration graph.

View on arXiv PDF

Similar