CL LGDec 4, 2023

Revisiting Topic-Guided Language Models

Carolina Zheng, Keyon Vafa, David M. Blei

arXiv:2312.02331v11.33 citationsh-index: 101Has CodeTrans. Mach. Learn. Res.

Originality Synthesis-oriented

AI Analysis

This work addresses the effectiveness of topic-guided language models for NLP researchers, revealing that current methods are incremental and do not improve over simpler baselines.

The paper compared four topic-guided language models against standard baselines on four corpora and found that none outperformed a standard LSTM language model in predictive performance, with most failing to learn good topics, and showed that the baseline's hidden states already encode topic information.

A recent line of work in natural language processing has aimed to combine language models and topic models. These topic-guided language models augment neural language models with topic models, unsupervised learning methods that can discover document-level patterns of word use. This paper compares the effectiveness of these methods in a standardized setting. We study four topic-guided language models and two baselines, evaluating the held-out predictive performance of each model on four corpora. Surprisingly, we find that none of these methods outperform a standard LSTM language model baseline, and most fail to learn good topics. Further, we train a probe of the neural language model that shows that the baseline's hidden states already encode topic information. We make public all code used for this study.

View on arXiv PDF Code

Similar