ML COJun 9, 2015

Variational consensus Monte Carlo

Maxim Rabinovich, Elaine Angelino, Michael I. Jordan

arXiv:1506.03074v113.451 citations

Originality Incremental advance

AI Analysis

This work addresses the scalability bottleneck in Bayesian statistics for practitioners dealing with large datasets, offering a novel variational approach that is incremental but provides measurable gains.

The paper tackles the problem of scaling Bayesian inference to large datasets by introducing variational consensus Monte Carlo (VCMC), which optimizes aggregation functions to improve posterior approximations over existing consensus Monte Carlo, achieving error reductions of up to 39% in probit regression and 92% in Gaussian mixture models.

Practitioners of Bayesian statistics have long depended on Markov chain Monte Carlo (MCMC) to obtain samples from intractable posterior distributions. Unfortunately, MCMC algorithms are typically serial, and do not scale to the large datasets typical of modern machine learning. The recently proposed consensus Monte Carlo algorithm removes this limitation by partitioning the data and drawing samples conditional on each partition in parallel (Scott et al, 2013). A fixed aggregation function then combines these samples, yielding approximate posterior samples. We introduce variational consensus Monte Carlo (VCMC), a variational Bayes algorithm that optimizes over aggregation functions to obtain samples from a distribution that better approximates the target. The resulting objective contains an intractable entropy term; we therefore derive a relaxation of the objective and show that the relaxed problem is blockwise concave under mild conditions. We illustrate the advantages of our algorithm on three inference tasks from the literature, demonstrating both the superior quality of the posterior approximation and the moderate overhead of the optimization step. Our algorithm achieves a relative error reduction (measured against serial MCMC) of up to 39% compared to consensus Monte Carlo on the task of estimating 300-dimensional probit regression parameter expectations; similarly, it achieves an error reduction of 92% on the task of estimating cluster comembership probabilities in a Gaussian mixture model with 8 components in 8 dimensions. Furthermore, these gains come at moderate cost compared to the runtime of serial MCMC, achieving near-ideal speedup in some instances.

View on arXiv PDF

Similar