MLLGMay 4, 2023

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

arXiv:2305.02657v430 citations
Originality Incremental advance
AI Analysis

This work addresses theoretical understanding of neural network training dynamics and generalization for researchers in machine learning theory, but it is incremental as it extends existing eigenvalue decay analysis to more general domains.

The paper tackles the problem of determining eigenvalue decay rates for kernel functions, including neural tangent kernels, on general domains, proving that wide neural networks approximate kernel regression and showing minimax optimality under certain conditions, with results indicating overfitted networks generalize poorly.

In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes