AIDec 23, 2024

Enhancing Topic Interpretability for Neural Topic Modeling through Topic-wise Contrastive Learning

Xin Gao, Yang Lin, Ruiqing Li, Yasha Wang, Xu Chu, Xinyu Ma, Hailong Yu

arXiv:2412.17338v14.24 citationsh-index: 14ICDE

Originality Incremental advance

AI Analysis

This addresses the need for more interpretable topic models in data mining and knowledge discovery, though it is incremental as it builds on existing NTM methods with a novel regularization approach.

The paper tackled the problem of neural topic models (NTMs) often lacking interpretability due to overemphasis on likelihood maximization, and introduced ContraTopic, a framework that integrates a topic-wise contrastive learning regularizer to enhance interpretability, resulting in consistently superior interpretability compared to state-of-the-art NTMs on three diverse datasets.

Data mining and knowledge discovery are essential aspects of extracting valuable insights from vast datasets. Neural topic models (NTMs) have emerged as a valuable unsupervised tool in this field. However, the predominant objective in NTMs, which aims to discover topics maximizing data likelihood, often lacks alignment with the central goals of data mining and knowledge discovery which is to reveal interpretable insights from large data repositories. Overemphasizing likelihood maximization without incorporating topic regularization can lead to an overly expansive latent space for topic modeling. In this paper, we present an innovative approach to NTMs that addresses this misalignment by introducing contrastive learning measures to assess topic interpretability. We propose a novel NTM framework, named ContraTopic, that integrates a differentiable regularizer capable of evaluating multiple facets of topic interpretability throughout the training process. Our regularizer adopts a unique topic-wise contrastive methodology, fostering both internal coherence within topics and clear external distinctions among them. Comprehensive experiments conducted on three diverse datasets demonstrate that our approach consistently produces topics with superior interpretability compared to state-of-the-art NTMs.

View on arXiv PDF

Similar