MLDIS-NNLGFeb 1, 2022

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

arXiv:2202.00293v454 citations
Originality Incremental advance
AI Analysis

This work addresses the optimization challenges in narrow neural networks for machine learning practitioners, providing insights into the interplay between network width and learning dynamics, though it is incremental as it builds on existing statistical physics frameworks.

The paper investigates the transition between global convergence and poor generalization in two-layer neural networks under stochastic gradient descent, focusing on high-dimensional Gaussian data and deriving rigorous convergence rates for the dynamics.

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes