CLHCIRJun 28, 2024

Interactive Topic Models with Optimal Transport

arXiv:2406.19928v11 citations
Originality Incremental advance
AI Analysis

This work addresses the need for interactive topic modeling tools that can incorporate analyst feedback, which is useful for researchers analyzing document collections with predefined categories.

The authors tackled the problem of incorporating analyst knowledge into topic modeling by developing EdTM, a label name supervised approach that uses optimal transport for globally coherent topic assignments. Their method outperformed few-shot LLM classifiers and traditional topic models like LDA in experiments while remaining robust to noisy inputs.

Topic models are widely used to analyze document collections. While they are valuable for discovering latent topics in a corpus when analysts are unfamiliar with the corpus, analysts also commonly start with an understanding of the content present in a corpus. This may be through categories obtained from an initial pass over the corpus or a desire to analyze the corpus through a predefined set of categories derived from a high level theoretical framework (e.g. political ideology). In these scenarios analysts desire a topic modeling approach which incorporates their understanding of the corpus while supporting various forms of interaction with the model. In this work, we present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities and using optimal transport for making globally coherent topic-assignments. In experiments, we show the efficacy of our framework compared to few-shot LLM classifiers, and topic models based on clustering and LDA. Further, we show EdTM's ability to incorporate various forms of analyst feedback and while remaining robust to noisy analyst inputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes