CL AI LG STJan 25, 2023

Improving the Inference of Topic Models via Infinite Latent State Replications

Daniel Rugeles, Zhen Hai, Juan Felipe Carmona, Manoranjan Dash, Gao Cong

arXiv:2301.12974v10.51 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses inference robustness in topic modeling for text mining, but it is incremental as it builds on existing methods like CGS.

The paper tackles the problem of improving inference in topic models by proposing infinite latent state replication (ILR), which outperforms collapsed Gibbs sampling (CGS) on established models, as shown in experiments on publicly available datasets.

In text mining, topic models are a type of probabilistic generative models for inferring latent semantic topics from text corpus. One of the most popular inference approaches to topic models is perhaps collapsed Gibbs sampling (CGS), which typically samples one single topic label for each observed document-word pair. In this paper, we aim at improving the inference of CGS for topic models. We propose to leverage state augmentation technique by maximizing the number of topic samples to infinity, and then develop a new inference approach, called infinite latent state replication (ILR), to generate robust soft topic assignment for each given document-word pair. Experimental results on the publicly available datasets show that ILR outperforms CGS for inference of existing established topic models.

View on arXiv PDF

Similar