Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM
This work addresses the need for interpretable topic modeling and word embeddings in natural language processing, but it appears incremental as it modifies an existing algorithm without demonstrating broad SOTA improvements.
The paper tackled the problem of extracting interpretable topics and learning word embeddings by applying a row-stochastic variation of the DEDICOM algorithm to pointwise mutual information matrices of text corpora, resulting in a method that identifies latent topic clusters and produces interpretable embeddings with qualitative evaluation.
The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.