LGAIJun 9, 2022

Learning Non-Vacuous Generalization Bounds from Optimization

arXiv:2206.04359v22 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the challenge of understanding generalization in deep learning for researchers and practitioners, though it is incremental as it builds on existing complexity-based approaches.

The paper tackles the problem of deriving informative generalization bounds for deep neural networks by proposing a non-vacuous bound based on optimization dynamics, achieving plausible guarantees for models like ResNet and Vision Transformer on large-scale datasets such as ImageNet-1K.

One of the fundamental challenges in the deep learning community is to theoretically understand how well a deep neural network generalizes to unseen data. However, current approaches often yield generalization bounds that are either too loose to be informative of the true generalization error or only valid to the compressed nets. In this study, we present a simple yet non-vacuous generalization bound from the optimization perspective. We achieve this goal by leveraging that the hypothesis set accessed by stochastic gradient algorithms is essentially fractal-like and thus can derive a tighter bound over the algorithm-dependent Rademacher complexity. The main argument rests on modeling the discrete-time recursion process via a continuous-time stochastic differential equation driven by fractional Brownian motion. Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks such as ResNet and Vision Transformer, even when they are trained on a large-scale dataset (e.g. ImageNet-1K).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes