Vectorized Bayesian Inference for Latent Dirichlet-Tree Allocation
This work addresses the problem of modeling complex topic structures in text data for researchers and practitioners in natural language processing, representing an incremental improvement over LDA.
The paper tackles the limitation of Latent Dirichlet Allocation (LDA) in representing topic correlations and hierarchies by introducing Latent Dirichlet-Tree Allocation (LDTA), which uses a Dirichlet-Tree prior, and develops vectorized inference methods that maintain scalability and efficiency.
Latent Dirichlet Allocation (LDA) is a foundational model for discovering latent thematic structure in discrete data, but its Dirichlet prior cannot represent the rich correlations and hierarchical relationships often present among topics. We introduce the framework of Latent Dirichlet-Tree Allocation (LDTA), a generalization of LDA that replaces the Dirichlet prior with an arbitrary Dirichlet-Tree (DT) distribution. LDTA preserves LDA's generative structure but enables expressive, tree-structured priors over topic proportions. To perform inference, we develop universal mean-field variational inference and Expectation Propagation, providing tractable updates for all DT. We reveal the vectorized nature of the two inference methods through theoretical development, and perform fully vectorized, GPU-accelerated implementations. The resulting framework substantially expands the modeling capacity of LDA while maintaining scalability and computational efficiency.