OC LG MLJul 17, 2025

Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises

arXiv:2507.13283v14.1h-index: 4

Originality Incremental advance

AI Analysis

This work addresses incomplete theoretical understanding for stochastic optimization in non-convex, non-smooth settings under heavy-tailed noise, which is incremental as it extends existing analyses to weakly convex objectives.

The paper tackles the problem of analyzing stochastic first-order methods for weakly convex optimization under heavy-tailed gradient noises, specifically sub-Weibull and p-BCM noises, showing that vanilla stochastic subgradient descent maintains theoretical dependence on failure probability and iterations under sub-Weibull noises, and clipped SGD maintains dependence under p-BCM noises but with worse sample complexity compared to a lower bound.

An increasing number of studies have focused on stochastic first-order methods (SFOMs) under heavy-tailed gradient noises, which have been observed in the training of practical deep learning models. In this paper, we focus on two types of gradient noises: one is sub-Weibull noise, and the other is noise under the assumption that it has a bounded $p$-th central moment ($p$-BCM) with $p\in (1, 2]$. The latter is more challenging due to the occurrence of infinite variance when $p\in (1, 2)$. Under these two gradient noise assumptions, the in-expectation and high-probability convergence of SFOMs have been extensively studied in the contexts of convex optimization and standard smooth optimization. However, for weakly convex objectives-a class that includes all Lipschitz-continuous convex objectives and smooth objectives-our understanding of the in-expectation and high-probability convergence of SFOMs under these two types of noises remains incomplete. We investigate the high-probability convergence of the vanilla stochastic subgradient descent (SsGD) method under sub-Weibull noises, as well as the high-probability and in-expectation convergence of clipped SsGD under the $p$-BCM noises. Both analyses are conducted in the context of weakly convex optimization. For weakly convex objectives that may be non-convex and non-smooth, our results demonstrate that the theoretical dependence of vanilla SsGD on the failure probability and number of iterations under sub-Weibull noises does not degrade compared to the case of smooth objectives. Under $p$-BCM noises, our findings indicate that the non-smoothness and non-convexity of weakly convex objectives do not impact the theoretical dependence of clipped SGD on the failure probability relative to the smooth case; however, the sample complexity we derived is worse than a well-known lower bound for smooth optimization.

View on arXiv PDF

Similar