OCLGPRSTMLAug 17, 2023

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

arXiv:2308.08977v124 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work provides theoretical insights into SGD behavior for high-dimensional machine learning models, which is incremental but useful for researchers in optimization and statistical learning.

The paper tackles the analysis of stochastic gradient descent (SGD) dynamics in high-dimensional settings for generalized linear and multi-index models, deriving a deterministic ODE equivalent that predicts statistics like risk and provides learning rate thresholds for stability, with numerical simulations showing excellent agreement.

We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-dimensional limit when applied to generalized linear models and multi-index models (e.g. logistic regression, phase retrieval) with general data-covariance. In particular, we demonstrate a deterministic equivalent of SGD in the form of a system of ordinary differential equations that describes a wide class of statistics, such as the risk and other measures of sub-optimality. This equivalence holds with overwhelming probability when the model parameter count grows proportionally to the number of data. This framework allows us to obtain learning rate thresholds for stability of SGD as well as convergence guarantees. In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient (homogenized SGD) which allows us to analyze the dynamics of general statistics of SGD iterates. Finally, we illustrate this theory on some standard examples and show numerical simulations which give an excellent match to the theory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes