CLMay 5

LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

Minh Chu Xuan, Tien-Phat Nguyen, Linh Ngo Van, Dinh Viet Sang, Nguyen Thi Ngoc Diep, Trung Le

arXiv:2605.0329962.5

AI Analysis

For researchers in cross-lingual topic modeling, LLM-XTM offers a scalable and stable method to improve topic quality without requiring bilingual resources or white-box LLM access.

LLM-XTM enhances cross-lingual topic models by integrating LLM-guided topic refinement with self-consistency uncertainty quantification, achieving superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.

Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.

View on arXiv PDF

Similar