CLCYHCJan 29, 2024

Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

arXiv:2401.16348v2111 citationsh-index: 15EACL
AI Analysis

This work addresses the issue of misleading automated evaluation metrics for topic models, which is important for researchers and practitioners in natural language processing and content analysis, though it is incremental in re-evaluating existing models in a new setting.

The paper tackled the problem of evaluating topic models for content analysis by conducting the first interactive task-based evaluation of neural, supervised, and classical models, finding that the Contextual Neural Topic Model performed best in human evaluations and cluster metrics, while LDA remained competitive with some neural models despite automated metrics suggesting otherwise.

Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention. Automated evaluation metrics such as coherence are often used, however, their validity has been questioned for neural topic models (NTMs) and can overlook a models benefits in real world applications. To this end, we conduct the first evaluation of neural, supervised and classical topic models in an interactive task based setting. We combine topic models with a classifier and test their ability to help humans conduct content analysis and document annotation. From simulated, real user and expert pilot studies, the Contextual Neural Topic Model does the best on cluster evaluation metrics and human evaluations; however, LDA is competitive with two other NTMs under our simulated experiment and user study results, contrary to what coherence scores suggest. We show that current automated metrics do not provide a complete picture of topic modeling capabilities, but the right choice of NTMs can be better than classical models on practical task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes