LGMLJun 18, 2012

Dirichlet Process with Mixed Random Measures: A Nonparametric Topic Model for Labeled Data

arXiv:1206.4658v123 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better topic modeling in labeled datasets, such as documents and images, but it appears incremental as it builds upon existing HDP and mixed random measures frameworks.

The authors tackled the problem of modeling labeled data with a nonparametric topic model, resulting in a model that generates an unbounded number of topics per label and shows improved performance in label prediction compared to existing methods like MedLDA and LDA-SVM.

We describe a nonparametric topic model for labeled data. The model uses a mixture of random measures (MRM) as a base distribution of the Dirichlet process (DP) of the HDP framework, so we call it the DP-MRM. To model labeled data, we define a DP distributed random measure for each label, and the resulting model generates an unbounded number of topics for each label. We apply DP-MRM on single-labeled and multi-labeled corpora of documents and compare the performance on label prediction with MedLDA, LDA-SVM, and Labeled-LDA. We further enhance the model by incorporating ddCRP and modeling multi-labeled images for image segmentation and object labeling, comparing the performance with nCuts and rddCRP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes