MLJan 16, 2013

A Nested HDP for Hierarchical Topic Models

arXiv:1301.3570v19 citations
Originality Incremental advance
AI Analysis

This addresses the limitation of thematic borrowing in hierarchical topic models for text analysis, though it appears incremental as a generalization of existing methods.

The authors tackled the problem of rigid single-path formulations in hierarchical topic modeling by developing a nested hierarchical Dirichlet process (nHDP) that allows each word to follow its own path to a topic node, demonstrating it on 1.8 million documents from The New York Times.

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We demonstrate our algorithm on 1.8 million documents from The New York Times.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes