Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective
This work addresses the fundamental challenge of training efficiency in high-dimensional settings for neural networks, which is incremental as it builds on existing mean field theory to provide new theoretical bounds.
The paper proves that gradient descent training of two-layer neural networks under mean field scaling cannot reduce population risk faster than a rate of t^{-4/(d-2)}, indicating that training may suffer from the curse of dimensionality for high-dimensional data. Numerical evidence shows slower convergence with increasing dimension for general Lipschitz functions, but consistent rates when the target function aligns with the network's natural space.
We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks.