Analysing heavy-tail properties of Stochastic Gradient Descent by means of Stochastic Recurrence Equations
This work provides incremental theoretical insights into SGD behavior for researchers in machine learning theory, focusing on heavy-tail properties.
The paper tackles the analysis of heavy-tail properties in Stochastic Gradient Descent (SGD) by extending prior work on modeling SGD iterations as stochastic recursions, specifically addressing open questions and applying irreducible-proximal matrix theory to linear regression setups.
In recent works on the theory of machine learning, it has been observed that heavy tail properties of Stochastic Gradient Descent (SGD) can be studied in the probabilistic framework of stochastic recursions. In particular, Gürbüzbalaban et al. (arXiv:2006.04740) considered a setup corresponding to linear regression for which iterations of SGD can be modelled by a multivariate affine stochastic recursion $X_k=A_k X_{k-1}+B_k$, for independent and identically distributed pairs $(A_k, B_k)$, where $A_k$ is a random symmetric matrix and $B_k$ is a random vector. In this work, we will answer several open questions of the quoted paper and extend their results by applying the theory of irreducible-proximal (i-p) matrices.