An Automatic Approach for Document-level Topic Model Evaluation
This addresses the need for more reliable evaluation in topic modeling for researchers and practitioners, though it is incremental as it builds on existing extrinsic evaluation methods.
The paper tackled the problem of misleading topic model evaluation by showing large discrepancies between topic- and document-level quality, and proposed an automatic method for predicting model quality based on document-level analysis with empirical evidence for robustness.
Topic models jointly learn topics and document-level topic distribution. Extrinsic evaluation of topic models tends to focus exclusively on topic-level evaluation, e.g. by assessing the coherence of topics. We demonstrate that there can be large discrepancies between topic- and document-level model quality, and that basing model evaluation on topic-level analysis can be highly misleading. We propose a method for automatically predicting topic model quality based on analysis of document-level topic allocations, and provide empirical evidence for its robustness.