IRCLLGNov 22, 2021

Keyword Assisted Embedded Topic Model

arXiv:2112.03101v128 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more efficient use of prior knowledge in topic modeling for users analyzing text corpora, representing an incremental improvement over existing guided models.

The paper tackled the problem of incorporating user knowledge into topic models by proposing the Keyword Assisted Embedded Topic Model (KeyETM), which uses informative priors to improve topic quality, and demonstrated that it produces better topics than other guided models in quantitative metrics and human evaluations.

By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA), describe how words in documents are generated via a set of latent distributions called topics. Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics. As LDA and its extensions are unsupervised models, they aren't defined to make efficient use of a user's prior knowledge of the domain. To this end, we propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors over the vocabulary. Using both quantitative metrics and human responses on a topic intrusion task, we demonstrate that KeyETM produces better topics than other guided, generative models in the literature.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes