CL LGAug 19, 2021

A Framework for Neural Topic Modeling of Text Corpora

arXiv:2108.08946v10.2Has Code

Originality Synthesis-oriented

AI Analysis

This work provides a tool for researchers and practitioners in fields like NLP and data analysis to efficiently perform topic modeling, but it is incremental as it combines existing methods into a framework.

The authors tackled the problem of topic modeling for text corpora by introducing FAME, an open-source framework that integrates various textual features, including traditional frequency-based methods and modern transformer embeddings, to discover topics and cluster semantically similar documents, demonstrating its effectiveness on the News-Group dataset.

Topic Modeling refers to the problem of discovering the main topics that have occurred in corpora of textual data, with solutions finding crucial applications in numerous fields. In this work, inspired by the recent advancements in the Natural Language Processing domain, we introduce FAME, an open-source framework enabling an efficient mechanism of extracting and incorporating textual features and utilizing them in discovering topics and clustering text documents that are semantically similar in a corpus. These features range from traditional approaches (e.g., frequency-based) to the most recent auto-encoding embeddings from transformer-based language models such as BERT model family. To demonstrate the effectiveness of this library, we conducted experiments on the well-known News-Group dataset. The library is available online.

View on arXiv PDF

Similar