CL IR LGOct 7, 2017

Topic Modeling based on Keywords and Context

arXiv:1710.02650v21.822 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses issues in topic modeling for text analysis, but it is incremental as it builds on existing models like LDA with specific improvements.

The authors tackled the problem of topic models producing unintuitive topics, unnatural topic switching, and high computational demands by proposing a model based on automatically identifying keywords that influence nearby word assignments. Their method achieved gains in classification accuracy, PMI score, computational performance, and topic consistency across 9 datasets, often using fewer topics.

Current topic models often suffer from discovering topics not matching human intuition, unnatural switching of topics within documents and high computational demands. We address these concerns by proposing a topic model and an inference algorithm based on automatically identifying characteristic keywords for topics. Keywords influence topic-assignments of nearby words. Our algorithm learns (key)word-topic scores and it self-regulates the number of topics. Inference is simple and easily parallelizable. Qualitative analysis yields comparable results to state-of-the-art models (eg. LDA), but with different strengths and weaknesses. Quantitative analysis using 9 datasets shows gains in terms of classification accuracy, PMI score, computational performance and consistency of topic assignments within documents, while most often using less topics.

View on arXiv PDF Code

Similar