LGDIS-NNSTMLJun 10, 2020

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

arXiv:2006.06098v285 citations
Originality Incremental advance
AI Analysis

This work provides theoretical insights into SGD dynamics for a specific non-convex problem, which is incremental as it builds on existing methods like dynamical mean-field theory.

The authors tackled the problem of understanding the learning dynamics of stochastic gradient descent (SGD) in a non-convex setting by analyzing a single-layer neural network classifying a high-dimensional Gaussian mixture, resulting in a closed-form analysis that reveals how the algorithm navigates the loss landscape.

We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit, we recover the standard gradient flow. We apply dynamical mean-field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process. We explore the performance of the algorithm as a function of the control parameters shedding light on how it navigates the loss landscape.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes