Coordinated Topic Modeling
This provides more interpretable corpus representations for NLP researchers and practitioners, though it appears to be an incremental improvement over existing topic modeling approaches.
The paper tackles the problem of making topic modeling more interpretable by introducing coordinated topic modeling, which uses predefined topics as semantic axes to represent text corpora. Their ECTM model with topic/document-level supervision and self-training outperformed baselines across multiple domains.
We propose a new problem called coordinated topic modeling that imitates human behavior while describing a text corpus. It considers a set of well-defined topics like the axes of a semantic space with a reference representation. It then uses the axes to model a corpus for easily understandable representation. This new task helps represent a corpus more interpretably by reusing existing knowledge and benefits the corpora comparison task. We design ECTM, an embedding-based coordinated topic model that effectively uses the reference representation to capture the target corpus-specific aspects while maintaining each topic's global semantics. In ECTM, we introduce the topic- and document-level supervision with a self-training mechanism to solve the problem. Finally, extensive experiments on multiple domains show the superiority of our model over other baselines.