MLNov 6, 2014

Sublinear-Time Approximate MCMC Transitions for Probabilistic Programs

Yutian Chen, Vikash Mansinghka, Zoubin Ghahramani

arXiv:1411.1690v2

Originality Incremental advance

AI Analysis

This work addresses scalability issues in probabilistic programming for machine learning practitioners, enabling faster inference for complex models, though it builds incrementally on existing approximate MH techniques.

The paper tackles the problem of slow Bayesian parameter estimation for coupled models like regressions and state-space models, where each MCMC transition scales linearly with observations, by introducing a sublinear-time algorithm for Metropolis-Hastings updates that generalizes approximate MH techniques to subsample edges in graphical models, achieving sublinear per-transition scaling in applications such as Bayesian logistic regression and stochastic volatility models.

Probabilistic programming languages can simplify the development of machine learning techniques, but only if inference is sufficiently scalable. Unfortunately, Bayesian parameter estimation for highly coupled models such as regressions and state-space models still scales poorly; each MCMC transition takes linear time in the number of observations. This paper describes a sublinear-time algorithm for making Metropolis-Hastings (MH) updates to latent variables in probabilistic programs. The approach generalizes recently introduced approximate MH techniques: instead of subsampling data items assumed to be independent, it subsamples edges in a dynamically constructed graphical model. It thus applies to a broader class of problems and interoperates with other general-purpose inference techniques. Empirical results, including confirmation of sublinear per-transition scaling, are presented for Bayesian logistic regression, nonlinear classification via joint Dirichlet process mixtures, and parameter estimation for stochastic volatility models (with state estimation via particle MCMC). All three applications use the same implementation, and each requires under 20 lines of probabilistic code.

View on arXiv PDF

Similar