On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models
This work addresses the theoretical understanding of overfitting in neural networks for researchers in machine learning theory, providing insights into when overparameterized models generalize well, but it is incremental as it builds on existing NTK frameworks.
The paper analyzes the generalization performance of overfitted two-layer neural tangent kernel (NTK) models, showing that test error behavior differs from other overparameterized linear models. For learnable functions, they provide an upper bound on generalization error that approaches a small limiting value as neurons increase, decreasing with more training samples, while for non-learnable functions, a lower bound indicates error does not diminish even with large data and neurons.
In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.