LGAICLITJul 3, 2023

vONTSS: vMF based semi-supervised neural topic modeling with optimal transport

Amazon
arXiv:2307.01226v2228 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the problem of limited real-world applications for neural topic models due to difficulties in integrating human knowledge, offering an incremental improvement for researchers and practitioners in text analysis.

The paper tackles the challenge of incorporating human knowledge into neural topic models by proposing vONTSS, a semi-supervised method using von Mises-Fisher variational autoencoders and optimal transport, which outperforms existing methods in classification accuracy and diversity and is faster than state-of-the-art weakly supervised methods while achieving similar performance.

Recently, Neural Topic Models (NTM), inspired by variational autoencoders, have attracted a lot of research interest; however, these methods have limited applications in the real world due to the challenge of incorporating human knowledge. This work presents a semi-supervised neural topic modeling method, vONTSS, which uses von Mises-Fisher (vMF) based variational autoencoders and optimal transport. When a few keywords per topic are provided, vONTSS in the semi-supervised setting generates potential topics and optimizes topic-keyword quality and topic classification. Experiments show that vONTSS outperforms existing semi-supervised topic modeling methods in classification accuracy and diversity. vONTSS also supports unsupervised topic modeling. Quantitative and qualitative experiments show that vONTSS in the unsupervised setting outperforms recent NTMs on multiple aspects: vONTSS discovers highly clustered and coherent topics on benchmark datasets. It is also much faster than the state-of-the-art weakly supervised text classification method while achieving similar classification performance. We further prove the equivalence of optimal transport loss and cross-entropy loss at the global minimum.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes