MLLGOCPRSTMar 20, 2024

Analysing heavy-tail properties of Stochastic Gradient Descent by means of Stochastic Recurrence Equations

arXiv:2403.13868v1h-index: 19J Appl Probab
Originality Synthesis-oriented
AI Analysis

This work provides incremental theoretical insights into SGD behavior for researchers in machine learning theory, focusing on heavy-tail properties.

The paper tackles the analysis of heavy-tail properties in Stochastic Gradient Descent (SGD) by extending prior work on modeling SGD iterations as stochastic recursions, specifically addressing open questions and applying irreducible-proximal matrix theory to linear regression setups.

In recent works on the theory of machine learning, it has been observed that heavy tail properties of Stochastic Gradient Descent (SGD) can be studied in the probabilistic framework of stochastic recursions. In particular, Gürbüzbalaban et al. (arXiv:2006.04740) considered a setup corresponding to linear regression for which iterations of SGD can be modelled by a multivariate affine stochastic recursion $X_k=A_k X_{k-1}+B_k$, for independent and identically distributed pairs $(A_k, B_k)$, where $A_k$ is a random symmetric matrix and $B_k$ is a random vector. In this work, we will answer several open questions of the quoted paper and extend their results by applying the theory of irreducible-proximal (i-p) matrices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes