LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models
For researchers in cross-lingual topic modeling, LLM-XTM offers a scalable and stable method to improve topic quality without requiring bilingual resources or white-box LLM access.
LLM-XTM enhances cross-lingual topic models by integrating LLM-guided topic refinement with self-consistency uncertainty quantification, achieving superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.
Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.