Evaluating topic coherence measures
This work addresses the challenge of improving topic interpretability for researchers and practitioners using topic models, but it is incremental as it extends existing coherence measures rather than introducing a new paradigm.
The authors tackled the problem of evaluating topic coherence in topic models by introducing coherence measures from scientific philosophy that score pairs of complex word subsets, rather than just individual words, and applied them to topic scoring for the first time.
Topic models extract representative word sets - called topics - from word counts in documents without requiring any semantic annotations. Topics are not guaranteed to be well interpretable, therefore, coherence measures have been proposed to distinguish between good and bad topics. Studies of topic coherence so far are limited to measures that score pairs of individual words. For the first time, we include coherence measures from scientific philosophy that score pairs of more complex word subsets and apply them to topic scoring.