MLLGApr 29, 2024

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks

arXiv:2404.18769v212 citationsh-index: 61J mach learn res
Originality Incremental advance
AI Analysis

This work addresses the theoretical understanding of neural network generalization for researchers, offering improved bounds over kernel methods but is incremental in refining existing analyses.

The paper tackles the problem of sample complexity and generalization for over-parameterized two-layer neural networks with norm constraints, showing that path and Barron norms yield width-independent sample complexity bounds and improved metric entropy of O(ε^{-2d/(d+2)}), leading to a generalization bound of O(n^{-(d+2)/(2d+2)}).

Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks as the curse of dimensionality (CoD) cannot be evaded when trying to approximate even a single ReLU neuron (Bach, 2017). In this paper, we study a suitable function space for over-parameterized two-layer neural networks with bounded norms (e.g., the path norm, the Barron norm) in the perspective of sample complexity and generalization properties. First, we show that the path norm (as well as the Barron norm) is able to obtain width-independence sample complexity bounds, which allows for uniform convergence guarantees. Based on this result, we derive the improved result of metric entropy for $ε$-covering up to $O(ε^{-\frac{2d}{d+2}})$ ($d$ is the input dimension and the depending constant is at most linear order of $d$) via the convex hull technique, which demonstrates the separation with kernel methods with $Ω(ε^{-d})$ to learn the target function in a Barron space. Second, this metric entropy result allows for building a sharper generalization bound under a general moment hypothesis setting, achieving the rate at $O(n^{-\frac{d+2}{2d+2}})$. Our analysis is novel in that it offers a sharper and refined estimation for metric entropy with a linear dimension dependence and unbounded sampling in the estimation of the sample error and the output error.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes