IRMEJun 13, 2012

Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression

arXiv:1206.3278v1431 citations
Originality Incremental advance
AI Analysis

This addresses the problem of integrating text and metadata for researchers in natural language processing, offering a flexible approach that is incremental over prior topic models.

The paper tackles the challenge of incorporating document metadata into topic modeling by proposing a Dirichlet-multinomial regression (DMR) topic model with a log-linear prior on document-topic distributions based on observed features like author and publication venue, showing it can meet or exceed the performance of existing specialized models.

Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes