CLNov 30, 2024

GloCOM: A Short Text Neural Topic Model via Global Clustering Context

Quang Duc Nguyen, Tung Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang Dinh, Thien Huu Nguyen

arXiv:2412.00525v29.614 citationsh-index: 11NAACL

Originality Incremental advance

AI Analysis

This work addresses the problem of topic modeling for short texts, which is incremental as it builds on existing neural models by incorporating global contexts.

The paper tackles the challenge of uncovering hidden topics from short texts by proposing GloCOM, a neural topic model that constructs aggregated global clustering contexts to address data and label sparsity, resulting in outperforming state-of-the-art models in topic quality and document representations.

Uncovering hidden topics from short texts is challenging for traditional and neural models due to data sparsity, which limits word co-occurrence patterns, and label sparsity, stemming from incomplete reconstruction targets. Although data aggregation offers a potential solution, existing neural topic models often overlook it due to time complexity, poor aggregation quality, and difficulty in inferring topic proportions for individual documents. In this paper, we propose a novel model, GloCOM (Global Clustering COntexts for Topic Models), which addresses these challenges by constructing aggregated global clustering contexts for short documents, leveraging text embeddings from pre-trained language models. GloCOM can infer both global topic distributions for clustering contexts and local distributions for individual short texts. Additionally, the model incorporates these global contexts to augment the reconstruction loss, effectively handling the label sparsity issue. Extensive experiments on short text datasets show that our approach outperforms other state-of-the-art models in both topic quality and document representations.

View on arXiv PDF

Similar