On the properties of variational approximations of Gibbs posteriors
This work addresses the computational bottleneck of slow sampling for big datasets in PAC-Bayesian learning, offering a faster alternative with theoretical guarantees, though it is incremental as it builds on existing variational methods.
The paper tackles the intractability of Gibbs posteriors in PAC-Bayesian methods by proposing variational approximations, finding that these approximations often achieve the same convergence rate as the original procedures. It demonstrates this with applications in classification, ranking, and matrix completion, showing good performance on real datasets.
The PAC-Bayesian approach is a powerful set of techniques to derive non- asymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately intractable. One may sample from it using Markov chain Monte Carlo, but this is often too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. We specialise our results to several learning tasks (classification, ranking, matrix completion),discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets.