MLLGOCApr 14, 2023

Wasserstein PAC-Bayes Learning: Exploiting Optimisation Guarantees to Explain Generalisation

arXiv:2304.07048v24 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap in machine learning generalization theory for researchers, though it appears incremental as it builds on existing Wasserstein PAC-Bayes frameworks.

The paper tackles the problem of improving PAC-Bayes generalization bounds by replacing KL divergence with Wasserstein distances to better capture loss function geometry, and demonstrates that optimization guarantees lead to strong generalization, including specific bounds for Bures-Wasserstein SGD.

PAC-Bayes learning is an established framework to both assess the generalisation ability of learning algorithms, and design new learning algorithm by exploiting generalisation bounds as training objectives. Most of the exisiting bounds involve a \emph{Kullback-Leibler} (KL) divergence, which fails to capture the geometric properties of the loss function which are often useful in optimisation. We address this by extending the emerging \emph{Wasserstein PAC-Bayes} theory. We develop new PAC-Bayes bounds with Wasserstein distances replacing the usual KL, and demonstrate that sound optimisation guarantees translate to good generalisation abilities. In particular we provide generalisation bounds for the \emph{Bures-Wasserstein SGD} by exploiting its optimisation properties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes