MLAIJan 8, 2012

A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process

arXiv:1201.1657v149 citations
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck for researchers using HDP models in applications like document analysis, though it is incremental as it adapts an existing method from the Dirichlet process to the HDP.

The authors tackled the problem of intractable posterior inference in the hierarchical Dirichlet process (HDP) by developing a novel split-merge MCMC algorithm, which showed significant improvements over traditional Gibbs sampling on synthetic data and text corpora.

The hierarchical Dirichlet process (HDP) has become an important Bayesian nonparametric model for grouped data, such as document collections. The HDP is used to construct a flexible mixed-membership model where the number of components is determined by the data. As for most Bayesian nonparametric models, exact posterior inference is intractable---practitioners use Markov chain Monte Carlo (MCMC) or variational inference. Inspired by the split-merge MCMC algorithm for the Dirichlet process (DP) mixture model, we describe a novel split-merge MCMC sampling algorithm for posterior inference in the HDP. We study its properties on both synthetic data and text corpora. We find that split-merge MCMC for the HDP can provide significant improvements over traditional Gibbs sampling, and we give some understanding of the data properties that give rise to larger improvements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes