Rate of Convergence of Polynomial Networks to Gaussian Processes
This provides theoretical insights into neural network behavior for researchers in machine learning theory, though it is incremental as it builds on known convergence results.
The paper tackles the problem of quantifying how quickly one-hidden-layer neural networks with random weights converge to Gaussian processes as the number of neurons increases, showing a rate of O(n^{-1/2}) for polynomial activations and improving rates for ReLU and erf activations.
We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is $O(n^{-\frac{1}{2}})$, where $n$ is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in $n$ for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.