How good is PAC-Bayes at explaining generalisation?
This work addresses theoretical limitations in understanding generalization for machine learning researchers, but it is incremental as it builds on existing PAC-Bayes analysis.
The paper investigates necessary conditions for PAC-Bayes bounds to offer meaningful generalization guarantees, finding that optimal guarantees depend on the risk distribution induced by the prior and require the prior to place sufficient mass on high-performing predictors. It critiques the use of data-dependent priors in deep learning and questions whether PAC-Bayes truly explains generalization.
We discuss necessary conditions for a PAC-Bayes bound to provide a meaningful generalisation guarantee. Our analysis reveals that the optimal generalisation guarantee depends solely on the distribution of the risk induced by the prior distribution. In particular, achieving a target generalisation level is only achievable if the prior places sufficient mass on high-performing predictors. We relate these requirements to the prevalent practice of using data-dependent priors in deep learning PAC-Bayes applications, and discuss the implications for the claim that PAC-Bayes ``explains'' generalisation.