ML DC LGApr 8, 2013

ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures

Dan Lovell, Jonathan Malmaud, Ryan P. Adams, Vikash K. Mansinghka

arXiv:1304.2302v132 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of scaling Bayesian nonparametric modeling for researchers and practitioners in fields like density estimation and natural language processing, offering a parallelizable solution that is incremental in enabling existing MCMC methods to run efficiently on distributed systems.

The paper tackles the computational expense and lack of parallelizability in Markov Chain Monte Carlo (MCMC) inference for Dirichlet process mixtures by proposing a reparameterization that induces conditional independencies, enabling parallel simulation across multiple cores without altering the model or affecting the true posterior distribution. The result is a method tested in cluster configurations of over 50 machines and 100 cores, including experiments on 1 million data vectors in 256 dimensions, demonstrating improved parallel efficiency and convergence.

The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference methods for the DP often provide a gold standard in terms asymptotic accuracy, they can be computationally expensive and are not obviously parallelizable. We propose a reparameterization of the Dirichlet process that induces conditional independencies between the atoms that form the random measure. This conditional independence enables many of the Markov chain transition operators for DP inference to be simulated in parallel across multiple cores. Applied to mixture modeling, our approach enables the Dirichlet process to simultaneously learn clusters that describe the data and superclusters that define the granularity of parallelization. Unlike previous approaches, our technique does not require alteration of the model and leaves the true posterior distribution invariant. It also naturally lends itself to a distributed software implementation in terms of Map-Reduce, which we test in cluster configurations of over 50 machines and 100 cores. We present experiments exploring the parallel efficiency and convergence properties of our approach on both synthetic and real-world data, including runs on 1MM data vectors in 256 dimensions.

View on arXiv PDF

Similar