LGCLAPMLOct 9, 2013

Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

arXiv:1310.2408v120 citations
Originality Incremental advance
AI Analysis

This addresses practical limitations in supervised topic modeling for text analysis applications, though it appears to be an incremental improvement on existing methods.

The paper tackles two issues in Bayesian logistic supervised topic models - over-weighting of response variables by document word counts and restrictive mean-field assumptions in variational inference - by introducing a regularization constant and developing a Gibbs sampling algorithm with auxiliary Polya-Gamma variables. Empirical results show significant improvements in prediction performance and time efficiency.

Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions. We address these issues by: 1) introducing a regularization constant to better balance the two parts based on an optimization formulation of Bayesian inference; and 2) developing a simple Gibbs sampling algorithm by introducing auxiliary Polya-Gamma variables and collapsing out Dirichlet variables. Our augment-and-collapse sampling algorithm has analytical forms of each conditional distribution without making any restricting assumptions and can be easily parallelized. Empirical results demonstrate significant improvements on prediction performance and time efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes