ML LGApr 29, 2024

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks

arXiv:2404.18769v215.112 citationsh-index: 61J mach learn res

Originality Incremental advance

AI Analysis

This work addresses the theoretical understanding of neural network generalization for researchers, offering improved bounds over kernel methods but is incremental in refining existing analyses.

The paper tackles the problem of sample complexity and generalization for over-parameterized two-layer neural networks with norm constraints, showing that path and Barron norms yield width-independent sample complexity bounds and improved metric entropy of O(ε^{-2d/(d+2)}), leading to a generalization bound of O(n^{-(d+2)/(2d+2)}).

Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks as the curse of dimensionality (CoD) cannot be evaded when trying to approximate even a single ReLU neuron (Bach, 2017). In this paper, we study a suitable function space for over-parameterized two-layer neural networks with bounded norms (e.g., the path norm, the Barron norm) in the perspective of sample complexity and generalization properties. First, we show that the path norm (as well as the Barron norm) is able to obtain width-independence sample complexity bounds, which allows for uniform convergence guarantees. Based on this result, we derive the improved result of metric entropy for $ε$-covering up to $O(ε^{-\frac{2d}{d+2}})$ ($d$ is the input dimension and the depending constant is at most linear order of $d$) via the convex hull technique, which demonstrates the separation with kernel methods with $Ω(ε^{-d})$ to learn the target function in a Barron space. Second, this metric entropy result allows for building a sharper generalization bound under a general moment hypothesis setting, achieving the rate at $O(n^{-\frac{d+2}{2d+2}})$. Our analysis is novel in that it offers a sharper and refined estimation for metric entropy with a linear dimension dependence and unbounded sampling in the estimation of the sample error and the output error.

View on arXiv PDF

Similar