Robust Spectral Inference for Joint Stochastic Matrix Factorization
This addresses the reliability issue in topic modeling for researchers and practitioners working with small, noisy text data.
The paper tackles the problem of poor performance in spectral inference methods for topic analysis by identifying violations of theoretical conditions in Joint Stochastic Matrix Factorization, and proposes a rectification method that achieves results comparable to probabilistic techniques while maintaining scalability and provable optimality.
Spectral inference provides fast algorithms and provable optimality for latent topic analysis. But for real data these algorithms require additional ad-hoc heuristics, and even then often produce unusable results. We explain this poor performance by casting the problem of topic inference in the framework of Joint Stochastic Matrix Factorization (JSMF) and showing that previous methods violate the theoretical conditions necessary for a good solution to exist. We then propose a novel rectification method that learns high quality topics and their interactions even on small, noisy data. This method achieves results comparable to probabilistic techniques in several domains while maintaining scalability and provable optimality.