MLLGPROct 21, 2024

Limit Theorems for Stochastic Gradient Descent with Infinite Variance

arXiv:2410.16340v44 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses a gap in theoretical understanding for SGD under infinite variance conditions, which is incremental but extends prior results from one-dimensional to multidimensional cases and broader distribution classes.

The paper tackles the theoretical analysis of stochastic gradient descent (SGD) when gradients have infinite variance, establishing its asymptotic behavior for a broad class of distributions in multidimensional cases. It characterizes the asymptotic distribution as the stationary distribution of an Ornstein-Uhlenbeck process driven by a stable Lévy process, with applications demonstrated in linear and logistic regression models.

Stochastic gradient descent is a classic algorithm that has gained great popularity especially in the last decades as the most common approach for training models in machine learning. While the algorithm has been well-studied when stochastic gradients are assumed to have a finite variance, there is significantly less research addressing its theoretical properties in the case of infinite variance gradients. In this paper, we establish the asymptotic behavior of stochastic gradient descent in the context of infinite variance stochastic gradients, assuming that the stochastic gradient is regular varying with index $α\in(1,2)$. The closest result in this context was established in 1969 , in the one-dimensional case and assuming that stochastic gradients belong to a more restrictive class of distributions. We extend it to the multidimensional case, covering a broader class of infinite variance distributions. As we show, the asymptotic distribution of the stochastic gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-Uhlenbeck process driven by an appropriate stable Lévy process. Additionally, we explore the applications of these results in linear regression and logistic regression models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes