High-dimensional Limit of SGD for Diagonal Linear Networks
Provides a rigorous theoretical framework for understanding SGD dynamics in a simplified neural network setting, offering explicit non-asymptotic convergence guarantees.
This work derives a stochastic differential equation approximation for SGD on diagonal linear networks in the high-dimensional limit, leading to a deterministic PDE that characterizes the time evolution of risk and other metrics. The dynamics are shown to converge exponentially fast to zero risk with high probability.
Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic dynamics are globally well posed and converge exponentially fast to zero risk with high probability, yielding a fully explicit non-asymptotic description of their long-time behavior. Numerical simulations corroborate our theoretical findings.