LGMar 22, 2014

Hierarchical Dirichlet Scaling Process

arXiv:1404.1282v315 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better topic modeling with metadata integration, but it is incremental as it builds on the hierarchical Dirichlet process.

The authors tackled the problem of modeling correlations between metadata and mixture components in Bayesian nonparametric mixed membership models by introducing the hierarchical Dirichlet scaling process (HDSP), which improved predictive performance over existing methods like labeled LDA and SVM in experiments on datasets such as newswire and product reviews.

We present the \textit{hierarchical Dirichlet scaling process} (HDSP), a Bayesian nonparametric mixed membership model. The HDSP generalizes the hierarchical Dirichlet process (HDP) to model the correlation structure between metadata in the corpus and mixture components. We construct the HDSP based on the normalized gamma representation of the Dirichlet process, and this construction allows incorporating a scaling function that controls the membership probabilities of the mixture components. We develop two scaling methods to demonstrate that different modeling assumptions can be expressed in the HDSP. We also derive the corresponding approximate posterior inference algorithms using variational Bayes. Through experiments on datasets of newswire, medical journal articles, conference proceedings, and product reviews, we show that the HDSP results in a better predictive performance than labeled LDA, partially labeled LDA, and author topic model and a better negative review classification performance than the supervised topic model and SVM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes