Overparameterized random feature regression with nearly orthogonal data
This work offers theoretical insights into the behavior of overparameterized neural networks for researchers in machine learning theory, but it is incremental as it builds on existing kernel methods and random feature analyses.
The paper analyzes random feature ridge regression (RFRR) with nearly orthogonal data in the overparameterized regime, showing that its errors concentrate around those of a kernel ridge regression (KRR) derived from an expected kernel, and provides a lower bound for generalization error in a student-teacher model.
We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.