CLFeb 24, 2017

Dirichlet-vMF Mixture Model

arXiv:1702.07495v1
Originality Synthesis-oriented
AI Analysis

This is an incremental method for topic modeling in natural language processing, offering a continuous alternative to discrete models like LDA.

The paper tackles the problem of deriving topic embeddings from multiple sets of embedding vectors by proposing VMFMix, a multi-document Von-Mises-Fisher mixture model with a Dirichlet prior, analogous to LDA but defined on a continuous hypersphere, and reports performance on two document classification tasks.

This document is about the multi-document Von-Mises-Fisher mixture model with a Dirichlet prior, referred to as VMFMix. VMFMix is analogous to Latent Dirichlet Allocation (LDA) in that they can capture the co-occurrence patterns acorss multiple documents. The difference is that in VMFMix, the topic-word distribution is defined on a continuous n-dimensional hypersphere. Hence VMFMix is used to derive topic embeddings, i.e., representative vectors, from multiple sets of embedding vectors. An efficient Variational Expectation-Maximization inference algorithm is derived. The performance of VMFMix on two document classification tasks is reported, with some preliminary analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes