LGOCMLAug 10, 2025

Online Convex Optimization with Heavy Tails: Old Algorithms, New Regrets, and Applications

arXiv:2508.07473v16 citationsh-index: 1
AI Analysis

This work addresses the problem of robust optimization under heavy-tailed noise for machine learning practitioners, offering incremental improvements by extending known algorithms to more challenging settings.

The paper tackles online convex optimization with heavy-tailed stochastic gradients by analyzing existing algorithms like Online Gradient Descent, showing they achieve optimal regret bounds without modifications such as gradient clipping. It also provides the first provable convergence result for nonsmooth nonconvex optimization under heavy-tailed noise.

In Online Convex Optimization (OCO), when the stochastic gradient has a finite variance, many algorithms provably work and guarantee a sublinear regret. However, limited results are known if the gradient estimate has a heavy tail, i.e., the stochastic gradient only admits a finite $\mathsf{p}$-th central moment for some $\mathsf{p}\in\left(1,2\right]$. Motivated by it, this work examines different old algorithms for OCO (e.g., Online Gradient Descent) in the more challenging heavy-tailed setting. Under the standard bounded domain assumption, we establish new regrets for these classical methods without any algorithmic modification. Remarkably, these regret bounds are fully optimal in all parameters (can be achieved even without knowing $\mathsf{p}$), suggesting that OCO with heavy tails can be solved effectively without any extra operation (e.g., gradient clipping). Our new results have several applications. A particularly interesting one is the first provable convergence result for nonsmooth nonconvex optimization under heavy-tailed noise without gradient clipping. Furthermore, we explore broader settings (e.g., smooth OCO) and extend our ideas to optimistic algorithms to handle different cases simultaneously.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes