LGDCIRMLNov 17, 2013

Towards Big Topic Modeling

arXiv:1311.4150v18 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues for researchers and practitioners handling large-scale topic modeling tasks, representing an incremental improvement over existing parallel LDA methods.

The paper tackles the scalability problem in big topic modeling by reducing communication costs in parallel latent Dirichlet allocation (LDA) algorithms, proposing a communication-efficient architecture based on power law that consumes orders of magnitude less communication time and achieves high accuracy, fast speed, and constant memory usage compared to state-of-the-art methods.

To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalability problem. To reduce the communication complexity among processors for a better scalability, we propose a novel communication-efficient parallel topic modeling architecture based on power law, which consumes orders of magnitude less communication time when the number of topics is large. We combine the proposed communication-efficient parallel architecture with the online belief propagation (OBP) algorithm referred to as POBP for big topic modeling tasks. Extensive empirical results confirm that POBP has the following advantages to solve the big topic modeling problem: 1) high accuracy, 2) communication-efficient, 3) fast speed, and 4) constant memory usage when compared with recent state-of-the-art parallel LDA algorithms on the multi-processor architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes