MLDIS-NNLGFeb 12, 2023

From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

arXiv:2302.05882v144 citationsh-index: 60
Originality Incremental advance
AI Analysis

This provides a theoretical framework for understanding SGD behavior in neural networks, which is incremental as it synthesizes existing regimes into a unified analysis.

The paper analyzes the dynamics of stochastic gradient descent (SGD) in two-layer neural networks trained on Gaussian data, deriving a deterministic, low-dimensional description that unifies regimes like gradient-flow, high-dimensional, and mean-field limits, showing that the dynamics stay close to a subspace spanned by target principal directions.

This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying analysis bridges different regimes of interest, such as the classical gradient-flow regime of vanishing learning rate, the high-dimensional regime of large input dimension, and the overparameterised "mean-field" regime of large network width, covering as well the intermediate regimes where the limiting dynamics is determined by the interplay between these behaviours. In particular, in the high-dimensional limit, the infinite-width dynamics is found to remain close to a low-dimensional subspace spanned by the target principal directions. Our results therefore provide a unifying picture of the limiting SGD dynamics with synthetic data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes