MLIRLGFeb 6, 2021

Exclusive Topic Modeling

arXiv:2102.03525v1
Originality Incremental advance
AI Analysis

This work addresses the problem of identifying field-specific keywords and generating more coherent topics for researchers and practitioners in text analysis, offering an incremental improvement over existing methods.

This paper introduces Exclusive Topic Modeling (ETM) for unsupervised text classification, designed to identify field-specific keywords and produce well-structured topics with exclusive words. It achieves this by using a weighted Lasso penalty to mitigate the influence of frequent but less relevant words and a pairwise Kullback-Leibler divergence penalty for topic separation. ETM improved topic coherence scores on the NIPS dataset by 22% with the weighted Lasso penalty and 10% with the pairwise Kullback-Leibler divergence penalty.

We propose an Exclusive Topic Modeling (ETM) for unsupervised text classification, which is able to 1) identify the field-specific keywords though less frequently appeared and 2) deliver well-structured topics with exclusive words. In particular, a weighted Lasso penalty is imposed to reduce the dominance of the frequently appearing yet less relevant words automatically, and a pairwise Kullback-Leibler divergence penalty is used to implement topics separation. Simulation studies demonstrate that the ETM detects the field-specific keywords, while LDA fails. When applying to the benchmark NIPS dataset, the topic coherence score on average improves by 22% and 10% for the model with weighted Lasso penalty and pairwise Kullback-Leibler divergence penalty, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes