ML STAT-MECH LG STFeb 16, 2019

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

Song Mei, Theodor Misiakiewicz, Andrea Montanari

arXiv:1902.06015v137.8326 citations

Originality Incremental advance

AI Analysis

This work provides theoretical guarantees for neural network training, which is incremental but addresses limitations in prior bounds for researchers in machine learning theory.

The paper tackles the problem of approximating the learning dynamics of two-layer neural networks via mean-field theory, establishing dimension-free bounds and extending results to unbounded activation functions and noisy stochastic gradient descent, while also recovering kernel ridge regression as a special limit.

We consider learning two layer neural networks using stochastic gradient descent. The mean-field description of this learning dynamics approximates the evolution of the network weights by an evolution in the space of probability distributions in $R^D$ (where $D$ is the number of parameters associated to each neuron). This evolution can be defined through a partial differential equation or, equivalently, as the gradient flow in the Wasserstein space of probability distributions. Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$. In this paper we establish stronger and more general approximation guarantees. First of all, we show that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions. Next, we generalize this analysis to the case of unbounded activation functions, which was not covered by earlier bounds. We extend our results to noisy stochastic gradient descent. Finally, we show that kernel ridge regression can be recovered as a special limit of the mean field analysis.

View on arXiv PDF

Similar