ML LGOct 30, 2017

Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions

arXiv:1710.11070v210 citations

Originality Incremental advance

AI Analysis

This provides theoretical guarantees for topic modeling in natural language processing, addressing a foundational statistical issue for researchers in machine learning and statistics, though it is incremental in extending prior work.

The paper tackles the problem of estimating convergence rates for Latent Dirichlet Allocation topic models under relaxed identifiability conditions, showing that the maximum likelihood estimator converges at a rate of n^{-1/4} in Wasserstein distance and proving this rate is optimal in the worst case.

In this paper we study the frequentist convergence rate for the Latent Dirichlet Allocation (Blei et al., 2003) topic models. We show that the maximum likelihood estimator converges to one of the finitely many equivalent parameters in Wasserstein's distance metric at a rate of $n^{-1/4}$ without assuming separability or non-degeneracy of the underlying topics and/or the existence of more than three words per document, thus generalizing the previous works of Anandkumar et al. (2012, 2014) from an information-theoretical perspective. We also show that the $n^{-1/4}$ convergence rate is optimal in the worst case.

View on arXiv PDF

Similar