Approximate Collapsed Gibbs Clustering with Expectation Propagation
This work addresses a computational bottleneck for researchers and practitioners using MCMC methods in clustering, though it is incremental as it builds on existing sampling and approximation techniques.
The paper tackles the intractability of collapsed Gibbs sampling in complex latent variable cluster models by approximating the necessary integrals using expectation propagation, enabling a runtime-accuracy tradeoff with competitive accuracy much faster than naive Gibbs samplers in case studies like mixtures of Student-t's and time series clustering.
We develop a framework for approximating collapsed Gibbs sampling in generative latent variable cluster models. Collapsed Gibbs is a popular MCMC method, which integrates out variables in the posterior to improve mixing. Unfortunately for many complex models, integrating out these variables is either analytically or computationally intractable. We efficiently approximate the necessary collapsed Gibbs integrals by borrowing ideas from expectation propagation. We present two case studies where exact collapsed Gibbs sampling is intractable: mixtures of Student-t's and time series clustering. Our experiments on real and synthetic data show that our approximate sampler enables a runtime-accuracy tradeoff in sampling these types of models, providing results with competitive accuracy much more rapidly than the naive Gibbs samplers one would otherwise rely on in these scenarios.