LGAIOct 26, 2023

Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle

arXiv:2310.17378v21 citationsh-index: 20
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for understanding generalization in deep learning, though it is incremental as it builds on existing PAC and Rademacher complexity frameworks.

The paper tackles the problem of explaining why over-parametrized ReLU networks generalize well by proposing a PAC-type bound on generalization error based on gradient sensitivity along the optimization trajectory, with results experimentally verified on MNIST and CIFAR-10 datasets.

Recent advances in deep learning have given us some very promising results on the generalization ability of deep neural networks, however literature still lacks a comprehensive theory explaining why heavily over-parametrized models are able to generalize well while fitting the training data. In this paper we propose a PAC type bound on the generalization error of feedforward ReLU networks via estimating the Rademacher complexity of the set of networks available from an initial parameter vector via gradient descent. The key idea is to bound the sensitivity of the network's gradient to perturbation of the input data along the optimization trajectory. The obtained bound does not explicitly depend on the depth of the network. Our results are experimentally verified on the MNIST and CIFAR-10 datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes