Dirichlet Process with Mixed Random Measures: A Nonparametric Topic Model for Labeled Data
This work addresses the need for better topic modeling in labeled datasets, such as documents and images, but it appears incremental as it builds upon existing HDP and mixed random measures frameworks.
The authors tackled the problem of modeling labeled data with a nonparametric topic model, resulting in a model that generates an unbounded number of topics per label and shows improved performance in label prediction compared to existing methods like MedLDA and LDA-SVM.
We describe a nonparametric topic model for labeled data. The model uses a mixture of random measures (MRM) as a base distribution of the Dirichlet process (DP) of the HDP framework, so we call it the DP-MRM. To model labeled data, we define a DP distributed random measure for each label, and the resulting model generates an unbounded number of topics for each label. We apply DP-MRM on single-labeled and multi-labeled corpora of documents and compare the performance on label prediction with MedLDA, LDA-SVM, and Labeled-LDA. We further enhance the model by incorporating ddCRP and modeling multi-labeled images for image segmentation and object labeling, comparing the performance with nCuts and rddCRP.