ML LGFeb 13, 2024

A PAC-Bayesian Link Between Generalisation and Flat Minima

Maxime Haddouche, Paul Viallard, Umut Simsekli, Benjamin Guedj

arXiv:2402.08508v217.612 citationsh-index: 30ALT

Originality Highly original

AI Analysis

This work addresses a foundational theoretical problem in machine learning for researchers studying generalization in overparameterized settings.

The paper tackles the challenge of explaining generalization in overparameterized models by deriving novel generalization bounds that incorporate gradient terms, highlighting the positive influence of flat minima on generalization performance.

Modern machine learning usually involves predictors in the overparameterised setting (number of trained parameters greater than dataset size), and their training yields not only good performance on training data, but also good generalisation capacity. This phenomenon challenges many theoretical results, and remains an open problem. To reach a better understanding, we provide novel generalisation bounds involving gradient terms. To do so, we combine the PAC-Bayes toolbox with Poincaré and Log-Sobolev inequalities, avoiding an explicit dependency on the dimension of the predictor space. Our results highlight the positive influence of flat minima (being minima with a neighbourhood nearly minimising the learning problem as well) on generalisation performance, involving directly the benefits of the optimisation phase.

View on arXiv PDF

Similar