CL LG MLJan 24, 2013

Transfer Topic Modeling with Ease and Scalability

arXiv:1301.5686v228 citations

AI Analysis

This work addresses the need for effective topic modeling in social media applications, offering a scalable solution that leverages external labeled data, though it is incremental as it builds on existing LDA frameworks.

The paper tackled the problem of topic modeling for short texts on social media, which are challenging for traditional methods like LDA due to sparsity and scalability issues, by proposing a transfer learning approach that improved model fitting and interpretation, achieving better performance on datasets like microblogging, AP, and RCV1.

The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with fast-changing topics and scalability concerns. In this paper, we propose a transfer learning approach that utilizes abundant labeled documents from other domains (such as Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting and result interpretation. Specifically, we develop Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors. In addition, we develop a parallel implementation of our model for large-scale applications. We demonstrate the effectiveness of our thLDA model on both a microblogging dataset and standard text collections including AP and RCV1 datasets.

View on arXiv PDF

Similar