MLIRLGFeb 6, 2021

Concentrated Document Topic Model

arXiv:2102.04449v1
AI Analysis

This work provides an incremental improvement in topic modeling for researchers and practitioners working with text classification, by producing more interpretable and sparse topic assignments.

This paper introduces the Concentrated Document Topic Model (CDTM) for unsupervised text classification, which addresses the issue of diffuse document-topic distributions. By applying an exponential entropy penalty, the model successfully generates more concentrated and sparse document-topic distributions, leading to more coherent topics compared to Latent Dirichlet Allocation (LDA) on the NIPS dataset.

We propose a Concentrated Document Topic Model(CDTM) for unsupervised text classification, which is able to produce a concentrated and sparse document topic distribution. In particular, an exponential entropy penalty is imposed on the document topic distribution. Documents that have diverse topic distributions are penalized more, while those having concentrated topics are penalized less. We apply the model to the benchmark NIPS dataset and observe more coherent topics and more concentrated and sparse document-topic distributions than Latent Dirichlet Allocation(LDA).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes