LG AI OC CO MLDec 16, 2025

Bias-Variance Trade-off for Clipped Stochastic First-Order Methods: From Bounded Variance to Infinite Mean

arXiv:2512.14686v1

Originality Incremental advance

AI Analysis

This work addresses a fundamental challenge in machine learning for practitioners dealing with heavy-tailed data, providing incremental theoretical extensions to cover previously understudied noise regimes.

The paper tackles the problem of stochastic optimization with heavy-tailed noise by analyzing clipped stochastic first-order methods (SFOMs) for noise tail indices α∈(0,2], including cases with infinite mean. It shows that controlling a symmetry measure of the noise tail leads to improved complexity guarantees, with numerical experiments validating the results.

Stochastic optimization is fundamental to modern machine learning. Recent research has extended the study of stochastic first-order methods (SFOMs) from light-tailed to heavy-tailed noise, which frequently arises in practice, with clipping emerging as a key technique for controlling heavy-tailed gradients. Extensive theoretical advances have further shown that the oracle complexity of SFOMs depends on the tail index $α$ of the noise. Nonetheless, existing complexity results often cover only the case $α\in (1,2]$, that is, the regime where the noise has a finite mean, while the complexity bounds tend to infinity as $α$ approaches $1$. This paper tackles the general case of noise with tail index $α\in(0,2]$, covering regimes ranging from noise with bounded variance to noise with an infinite mean, where the latter case has been scarcely studied. Through a novel analysis of the bias-variance trade-off in gradient clipping, we show that when a symmetry measure of the noise tail is controlled, clipped SFOMs achieve improved complexity guarantees in the presence of heavy-tailed noise for any tail index $α\in (0,2]$. Our analysis of the bias-variance trade-off not only yields new unified complexity guarantees for clipped SFOMs across this full range of tail indices, but is also straightforward to apply and can be combined with classical analyses under light-tailed noise to establish oracle complexity guarantees under heavy-tailed noise. Finally, numerical experiments validate our theoretical findings.

View on arXiv PDF

Similar