LGJan 2, 2024

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

arXiv:2401.01270v17 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work provides theoretical insights into the performance of kernel methods in large-dimensional settings, relevant for understanding neural networks via kernel approximations.

The paper determines the exact generalization error rates of kernel ridge regression in high dimensions under a source condition, showing that it is minimax optimal for 0<s≤1 but suboptimal for s>1 due to saturation effects.

Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n \asymp d^γ$ for some $γ> 0$. Given an RKHS $\mathcal{H}$ associated with an inner product kernel defined on the sphere $\mathbb{S}^{d}$, we suppose that the true function $f_ρ^{*} \in [\mathcal{H}]^{s}$, the interpolation space of $\mathcal{H}$ with source condition $s>0$. We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter $λ$. We then further showed that when $0<s\le1$, KRR is minimax optimal; and when $s>1$, KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along $γ$ exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with $s>0$. Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to $s=0$ and $s=1$ respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes