Scalable inference of topic evolution via models for latent geometric structures
This work addresses scalability issues in topic evolution analysis for large-scale text data, offering a significant speed improvement.
The paper tackles the problem of learning temporal dynamics of topic polytopes in topic modeling, developing a nonparametric Bayesian model and algorithm that discovers new topics over time and is several orders of magnitude faster than existing approaches, handling millions of documents in under two dozen minutes.
We develop new models and algorithms for learning the temporal dynamics of the topic polytopes and related geometric objects that arise in topic model based inference. Our model is nonparametric Bayesian and the corresponding inference algorithm is able to discover new topics as the time progresses. By exploiting the connection between the modeling of topic polytope evolution, Beta-Bernoulli process and the Hungarian matching algorithm, our method is shown to be several orders of magnitude faster than existing topic modeling approaches, as demonstrated by experiments working with several million documents in under two dozens of minutes.