Comprehensive Evaluation of Large Language Models for Topic Modeling
This work addresses the need for quantitative assessment of LLMs in topic modeling, providing insights for researchers and practitioners, though it is incremental as it builds on prior qualitative studies.
The paper quantitatively evaluated Large Language Models for topic modeling, finding they produce coherent and diverse topics with few hallucinations but may focus only on parts of documents and have limited controllability via prompts.
Recent work utilizes Large Language Models (LLMs) for topic modeling, generating comprehensible topic labels for given documents. However, their performance has mainly been evaluated qualitatively, and there remains room for quantitative investigation of their capabilities. In this paper, we quantitatively evaluate LLMs from multiple perspectives: the quality of topics, the impact of LLM-specific concerns, such as hallucination and shortcuts for limited documents, and LLMs' controllability of topic categories via prompts. Our findings show that LLMs can identify coherent and diverse topics with few hallucinations but may take shortcuts by focusing only on parts of documents. We also found that their controllability is limited.