A Nested HDP for Hierarchical Topic Models
This addresses the limitation of thematic borrowing in hierarchical topic models for text analysis, though it appears incremental as a generalization of existing methods.
The authors tackled the problem of rigid single-path formulations in hierarchical topic modeling by developing a nested hierarchical Dirichlet process (nHDP) that allows each word to follow its own path to a topic node, demonstrating it on 1.8 million documents from The New York Times.
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We demonstrate our algorithm on 1.8 million documents from The New York Times.