MLLGMay 13

A Survey on Data-Dependent Worst-Case Generalization Bounds

arXiv:2605.139132.8
Predicted impact top 66% in ML · last 90 daysOriginality Synthesis-oriented
AI Analysis

For theorists seeking tighter generalization guarantees in deep learning, this survey provides a structured overview of data-dependent bounds that avoid the vacuity of uniform convergence.

This survey organizes recent work on non-vacuous generalization bounds for overparameterized neural networks by unifying PAC-Bayesian, geometric, and stability-based approaches into a single template inequality, enabling head-to-head comparison of bounds.

Deep neural networks generalize well despite being heavily overparameterized, in apparent contradiction with classical learning theory based on uniform convergence over fixed hypothesis spaces. Uniform bounds over the entire parameter space are vacuous in this regime, and recent work has shown that non-vacuous guarantees can be recovered by restricting attention to the part of parameter space that the algorithm actually visits. This survey paper organizes this line of work around three steps: extending PAC-Bayesian theory to random, data-dependent hypothesis sets (arXiv:2404.17442); refining the complexity term with geometric and topological descriptors of the optimization trajectory, including fractal dimensions, alpha-weighted lifetime sums, and positive magnitude (arXiv:2006.09313, arXiv:2302.02766, arXiv:2407.08723); and replacing the resulting information-theoretic terms by stability assumptions (arXiv:2507.06775). We unify these contributions around a single template inequality and a head-to-head comparison of the resulting bounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes