Kolmogorov Width Decay and Poor Approximators in Machine Learning: Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels
This work addresses theoretical limitations in machine learning approximation theory, revealing fundamental gaps between different model classes, which is incremental but important for understanding neural network capabilities.
The paper tackles the problem of approximating functions with neural networks by showing that reproducing kernel Hilbert spaces are poor approximators for two-layer neural networks in high dimensions, and that multi-layer networks with small path norm poorly approximate certain Lipschitz functions, with results established in the L^2 topology.
We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces. The general technique is then applied to show that reproducing kernel Hilbert spaces are poor $L^2$-approximators for the class of two-layer neural networks in high dimension, and that multi-layer networks with small path norm are poor approximators for certain Lipschitz functions, also in the $L^2$-topology.