MLLGMar 23

Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

arXiv:2603.2264429.9h-index: 12
AI Analysis

This work addresses overfitting issues in Bayesian prediction methods for binary classification, which is incremental as it extends prior research on discrete priors to continuous settings.

The paper tackles the problem of overfitting in PAC-Bayes learning rules for binary classification, showing that standard Bayesian predictors can lead to non-vanishing excess loss in agnostic cases, while a modified approach with a sample-size-dependent prior ensures uniformly vanishing excess loss.

We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of $λ=1$ this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking with standard Bayesian prediction (up to the treatment of the single-parameter noise level). However, from a risk-minimization prediction perspective, this Bayesian predictor overfits and can lead to non-vanishing excess loss in the agnostic case. Instead a choice of $λ\gg 1$, which can be seen as using a sample-size-dependent-prior, ensures uniformly vanishing excess loss even in the agnostic case. We precisely characterize the effect of under-regularizing (and over-regularizing) as a function of the balance parameter $λ$, understanding the regimes in which this under-regularization is tempered or catastrophic. This work extends previous work by Zhu and Srebro [2025] that considered only discrete priors to PAC Bayes type learning rules and, through their rigorous Bayesian interpretation, to Bayesian prediction more generally.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes