Towards Weaker Variance Assumptions for Stochastic Optimization
This work addresses theoretical limitations in stochastic optimization by weakening variance assumptions, which is incremental but relevant for researchers in optimization theory and machine learning.
The paper revisits and extends a classical variance assumption for stochastic gradient algorithms, analyzing horizon-free, anytime algorithms with last-iterate rates for convex nonsmooth and stochastic optimization problems, including those with functional constraints or min-max settings, without requiring bounded feasible sets.
We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.