Non-asymptotic approximations of neural networks by Gaussian processes
This provides theoretical insights for researchers in machine learning theory, but it is incremental as it builds on known asymptotic results.
The paper quantifies the convergence of wide neural networks to Gaussian processes under random initialization, establishing explicit convergence rates in an infinite-dimensional functional space. It identifies two regimes: polynomial activations yield rates determined by degree, while non-polynomial ones depend on smoothness.
We study the extent to which wide neural networks may be approximated by Gaussian processes when initialized with random weights. It is a well-established fact that as the width of a network goes to infinity, its law converges to that of a Gaussian process. We make this quantitative by establishing explicit convergence rates for the central limit theorem in an infinite-dimensional functional space, metrized with a natural transportation distance. We identify two regimes of interest; when the activation function is polynomial, its degree determines the rate of convergence, while for non-polynomial activations, the rate is governed by the smoothness of the function.