APLGMLMay 27, 2020

On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

arXiv:2005.13530v155 citations
AI Analysis

This provides theoretical insights for researchers in machine learning theory, but it is incremental as it builds on existing work by Chizat and Bach.

The paper tackles the problem of determining when gradient descent training for two-layer ReLU-networks converges to minimum Bayes risk in the mean field regime, extending prior results to ReLU activations and cases without exact MBR parameters, with a condition independent of initialization.

We describe a necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution. This article extends recent results of Chizat and Bach to ReLU-activated networks and to the situation in which there are no parameters which exactly achieve MBR. The condition does not depend on the initalization of parameters and concerns only the weak convergence of the realization of the neural network, not its parameter distribution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes