CLApr 8, 2020

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

arXiv:2004.03974v2747 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of interpretability in topic modeling for researchers and practitioners, though it is incremental as it builds on existing neural and contextual embedding methods.

The paper tackled the problem of incoherent word groups in topic models by combining contextualized embeddings with neural topic models, resulting in more meaningful and coherent topics compared to traditional and recent neural models.

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret. Recently, neural topic models have shown improvements in overall coherence. Concurrently, contextual embeddings have advanced the state of the art of neural models in general. In this paper, we combine contextualized representations with neural topic models. We find that our approach produces more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models. Our results indicate that future improvements in language models will translate into better topic models.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes