MELGMLAug 16, 2016

A Geometrical Approach to Topic Model Estimation

arXiv:1608.04478v11 citations
AI Analysis

This work addresses a methodological bottleneck in topic modeling for researchers and practitioners, offering a novel approach to improve estimation accuracy.

The authors tackled the challenge of using Singular Value Decomposition (SVD) for learning topic models by revealing a low-dimensional simplex structure that bridges the low-rank matrix of interest and the SVD of text corpus matrices, enabling convenient reconstruction and achieving a derived rate of convergence supported by numerical experiments on simulated and real data.

In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix, masked by noise, and the Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. In this paper, we overcome the challenge by revealing a surprising insight: there is a low-dimensional simplex structure which can be viewed as a bridge between the low-rank matrix of interest and the SVD of the text corpus matrix, and allows us to conveniently reconstruct the former using the latter. Such an insight motivates a new SVD approach to learning topic models, which we analyze with delicate random matrix theory and derive the rate of convergence. We support our methods and theory numerically, using both simulated data and real data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes