Sparse Stochastic Inference for Latent Dirichlet allocation
This work addresses scalability and bias issues in Bayesian topic modeling for large-scale text analysis, representing an incremental improvement over existing methods.
The paper tackled the problem of scaling Bayesian topic models to large datasets by introducing a hybrid algorithm that combines sparse Gibbs sampling with online stochastic inference, achieving analysis of 1.2 million books (33 billion words) with thousands of topics and reducing bias compared to variational inference.
We present a hybrid algorithm for Bayesian topic models that combines the efficiency of sparse Gibbs sampling with the scalability of online stochastic inference. We used our algorithm to analyze a corpus of 1.2 million books (33 billion words) with thousands of topics. Our approach reduces the bias of variational inference and generalizes to many Bayesian hidden-variable models.