Dimension-free deterministic equivalents and scaling laws for random feature regression
This provides theoretical tools for analyzing random feature regression scaling, with implications for practitioners using kernel methods and neural networks.
The authors developed a dimension-free deterministic equivalent for the test error of random feature ridge regression, showing it depends only on feature map eigenvalues and providing non-asymptotic multiplicative guarantees. They applied this to derive sharp excess error rates and determine the minimal features needed for optimal minimax error under power-law spectral assumptions.
In this work we investigate the generalization performance of random feature ridge regression (RFRR). Our main contribution is a general deterministic equivalent for the test error of RFRR. Specifically, under a certain concentration property, we show that the test error is well approximated by a closed-form expression that only depends on the feature map eigenvalues. Notably, our approximation guarantee is non-asymptotic, multiplicative, and independent of the feature map dimension -- allowing for infinite-dimensional features. We expect this deterministic equivalent to hold broadly beyond our theoretical analysis, and we empirically validate its predictions on various real and synthetic datasets. As an application, we derive sharp excess error rates under standard power-law assumptions of the spectrum and target decay. In particular, we provide a tight result for the smallest number of features achieving optimal minimax error rate.