On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime
This provides theoretical insights for researchers in machine learning theory, but it is incremental as it builds on existing work by Chizat and Bach.
The paper tackles the problem of determining when gradient descent training for two-layer ReLU-networks converges to minimum Bayes risk in the mean field regime, extending prior results to ReLU activations and cases without exact MBR parameters, with a condition independent of initialization.
We describe a necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution. This article extends recent results of Chizat and Bach to ReLU-activated networks and to the situation in which there are no parameters which exactly achieve MBR. The condition does not depend on the initalization of parameters and concerns only the weak convergence of the realization of the neural network, not its parameter distribution.