Integral Probability Metrics PAC-Bayes Bounds
This work addresses the need for more flexible generalization analysis in ML, particularly for algorithms using large hypothesis spaces, though it appears incremental as it builds on existing PAC-Bayes frameworks.
The paper tackles the problem of deriving tighter generalization bounds for machine learning algorithms by introducing PAC-Bayes-style bounds that replace KL-divergence with Integral Probability Metrics like total variation and Wasserstein distance, resulting in bounds that interpolate between worst-case uniform convergence and improved data-dependent cases.
We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM). We provide instances of this bound with the IPM being the total variation metric and the Wasserstein distance. A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case (when the prior and posterior are far away from each other), and improved bounds in favorable cases (when the posterior and prior are close). This illustrates the possibility of reinforcing classical generalization bounds with algorithm- and data-dependent components, thus making them more suitable to analyze algorithms that use a large hypothesis space.