LGMLMay 23, 2024

Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss

arXiv:2405.14681v35 citationsh-index: 19NIPS
Originality Highly original
AI Analysis

This addresses a long-standing limitation in PAC-Bayes for researchers in statistical learning theory, enabling more efficient sequential data processing without incremental improvements.

The paper tackles the problem of sequential prior updates in PAC-Bayesian analysis, where confidence information was lost during updates, and presents a novel procedure that eliminates this information loss, resulting in significant empirical outperformance over state-of-the-art methods.

PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. It was inspired by Bayesian learning, which allows sequential data processing and naturally turns posteriors from one processing step into priors for the next. However, despite two and a half decades of research, the ability to update priors sequentially without losing confidence information along the way remained elusive for PAC-Bayes. While PAC-Bayes allows construction of data-informed priors, the final confidence intervals depend only on the number of points that were not used for the construction of the prior, whereas confidence information in the prior, which is related to the number of points used to construct the prior, is lost. This limits the possibility and benefit of sequential prior updates, because the final bounds depend only on the size of the final batch. We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss. The procedure is based on a novel decomposition of the expected loss of randomized classifiers. The decomposition rewrites the loss of the posterior as an excess loss relative to a downscaled loss of the prior plus the downscaled loss of the prior, which is bounded recursively. As a side result, we also present a generalization of the split-kl and PAC-Bayes-split-kl inequalities to discrete random variables, which we use for bounding the excess losses, and which can be of independent interest. In empirical evaluation the new procedure significantly outperforms state-of-the-art.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes