MLLGFeb 12, 2023

Generalization Ability of Wide Neural Networks on $\mathbb{R}$

arXiv:2302.05933v126 citationsh-index: 73
Originality Incremental advance
AI Analysis

This work addresses theoretical understanding of generalization in neural networks for researchers, providing insights into training strategies like early stopping, though it is incremental as it builds on existing NTK theory.

The study investigates the generalization ability of wide two-layer ReLU neural networks on ℝ, establishing spectral properties of the neural tangent kernel (NTK) and showing that with early stopping, the network achieves the minimax regression rate of n^{-2/3}, while overfitting leads to poor generalization.

We perform a study on the generalization ability of the wide two-layer ReLU neural network on $\mathbb{R}$. We first establish some spectral properties of the neural tangent kernel (NTK): $a)$ $K_{d}$, the NTK defined on $\mathbb{R}^{d}$, is positive definite; $b)$ $λ_{i}(K_{1})$, the $i$-th largest eigenvalue of $K_{1}$, is proportional to $i^{-2}$. We then show that: $i)$ when the width $m\rightarrow\infty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_{1}$ is $n^{-2/3}$; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv)$ if one trains the neural network till it overfits the data, the resulting neural network can not generalize well. Finally, we provide an explanation to reconcile our theory and the widely observed ``benign overfitting phenomenon''.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes