Learning curves for Gaussian process regression with power-law priors and targets
This provides theoretical tools for analyzing generalization error in GPR, KRR, and infinitely wide neural networks, though it is incremental as it builds on existing spectral assumptions.
The authors characterized the power-law asymptotics of learning curves for Gaussian process regression (GPR) and kernel ridge regression (KRR) by assuming power-law priors and target functions, and showed how this applies to infinitely wide neural networks via known kernel spectra.
We characterize the power-law asymptotics of learning curves for Gaussian process regression (GPR) under the assumption that the eigenspectrum of the prior and the eigenexpansion coefficients of the target function follow a power law. Under similar assumptions, we leverage the equivalence between GPR and kernel ridge regression (KRR) to show the generalization error of KRR. Infinitely wide neural networks can be related to GPR with respect to the neural network GP kernel and the neural tangent kernel, which in several cases is known to have a power-law spectrum. Hence our methods can be applied to study the generalization error of infinitely wide neural networks. We present toy experiments demonstrating the theory.