Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width
This work addresses the theoretical understanding of Gibbs sampling performance for researchers in machine learning and statistics, particularly in NLP, by providing a new property that ensures rapid mixing, though it is incremental as it builds on existing sampling methods.
The paper tackles the problem of weak theoretical guarantees for Gibbs sampling on factor graphs by introducing a new graph property called hierarchy width, showing that bounded hierarchy width ensures polynomial mixing time under suitable weight conditions, and demonstrates an NLP application where Gibbs sampling achieves rapid mixing and exceeds human accuracy.
Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results. Theoretical guarantees for its performance are weak: even for tree structured graphs, the mixing time of Gibbs may be exponential in the number of variables. To help understand the behavior of Gibbs sampling, we introduce a new (hyper)graph property, called hierarchy width. We show that under suitable conditions on the weights, bounded hierarchy width ensures polynomial mixing time. Our study of hierarchy width is in part motivated by a class of factor graph templates, hierarchical templates, which have bounded hierarchy width---regardless of the data used to instantiate them. We demonstrate a rich application from natural language processing in which Gibbs sampling provably mixes rapidly and achieves accuracy that exceeds human volunteers.