LGCLIRMLSep 26, 2013

Integrating Document Clustering and Topic Modeling

arXiv:1309.6874v1211 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a specific problem in text mining by combining clustering and topic modeling, but it appears incremental as it builds on existing methods without introducing a new paradigm.

The paper tackled the problem of integrating document clustering and topic modeling by proposing a multi-grain clustering topic model (MGCTM) that jointly performs both tasks, achieving the overall best performance as demonstrated in experiments on two datasets.

Document clustering and topic modeling are two closely related tasks which can mutually benefit each other. Topic modeling can project documents into a topic space which facilitates effective document clustering. Cluster labels discovered by document clustering can be incorporated into topic models to extract local topics specific to each cluster and global topics shared by all clusters. In this paper, we propose a multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance. Our model tightly couples two components: a mixture component used for discovering latent groups in document collection and a topic model component used for mining multi-grain topics including local topics specific to each cluster and global topics shared across clusters.We employ variational inference to approximate the posterior of hidden variables and learn model parameters. Experiments on two datasets demonstrate the effectiveness of our model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes