LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media
This addresses the challenge of comprehending social media content for researchers and analysts, though it appears incremental as it applies existing LLMs to a specific task.
The paper tackles the problem of analyzing complex online discourse by introducing LLMTaxo, a framework that uses large language models to automatically construct hierarchical taxonomies of factual claims from social media, which reduces redundancy and improves accessibility, with evaluations on three datasets showing GPT-4o mini consistently outperforms other models.
With the rapid expansion of content on social media platforms, analyzing and comprehending online discourse has become increasingly complex. This paper introduces LLMTaxo, a novel framework leveraging large language models for the automated construction of taxonomies of factual claims from social media by generating topics at multiple levels of granularity. The resulting hierarchical structure significantly reduces redundancy and improves information accessibility. We also propose dedicated taxonomy evaluation metrics to enable comprehensive assessment. Evaluations conducted on three diverse datasets demonstrate LLMTaxo's effectiveness in producing clear, coherent, and comprehensive taxonomies. Among the evaluated models, GPT-4o mini consistently outperforms others across most metrics. The framework's flexibility and low reliance on manual intervention underscore its potential for broad applicability.