LGAIJul 25, 2024

An Iterative Approach to Topic Modelling

arXiv:2407.17892v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of topic quality assessment for researchers and practitioners in text analysis, but it is incremental as it builds on existing methods without introducing a new paradigm.

The authors tackled the problem of assessing and improving topic modeling results by proposing an iterative process using BERTopic, which led to a set of topics that could not be further improved based on clustering measures, as demonstrated on a subset of the COVIDSenti-A dataset.

Topic modelling has become increasingly popular for summarizing text data, such as social media posts and articles. However, topic modelling is usually completed in one shot. Assessing the quality of resulting topics is challenging. No effective methods or measures have been developed for assessing the results or for making further enhancements to the topics. In this research, we propose we propose to use an iterative process to perform topic modelling that gives rise to a sense of completeness of the resulting topics when the process is complete. Using the BERTopic package, a popular method in topic modelling, we demonstrate how the modelling process can be applied iteratively to arrive at a set of topics that could not be further improved upon using one of the three selected measures for clustering comparison as the decision criteria. This demonstration is conducted using a subset of the COVIDSenti-A dataset. The early success leads us to believe that further research using in using this approach in conjunction with other topic modelling algorithms could be viable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes