A New Geometric Approach to Latent Topic Modeling and Discovery
This work addresses the challenge of topic discovery in text and image corpora, offering a convex and polynomial-time solution that improves upon existing non-convex or heuristic approaches, though it appears incremental in nature.
The paper tackles the problem of latent topic modeling by developing a new geometrically-motivated algorithm for nonnegative matrix factorization, which achieves competitive qualitative and quantitative performance compared to state-of-the-art methods on synthetic and real-world datasets.
A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme points of empirical cross-document word-frequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets.