WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling
This work addresses the problem of scalable and fast topic modeling for researchers and practitioners dealing with large text datasets, representing an incremental improvement by combining existing techniques like MCMC and variational Bayes.
The paper tackles the challenge of training an inference network jointly with a deep generative topic model to achieve scalability to big corpora and fast out-of-sample prediction, resulting in the development of WHAI, which demonstrates effectiveness and efficiency in experiments on big corpora.
To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarchy of gamma distributions, while the inference network of WHAI is a Weibull upward-downward variational autoencoder, which integrates a deterministic-upward deep neural network, and a stochastic-downward deep generative model based on a hierarchy of Weibull distributions. The Weibull distribution can be used to well approximate a gamma distribution with an analytic Kullback-Leibler divergence, and has a simple reparameterization via the uniform noise, which help efficiently compute the gradients of the evidence lower bound with respect to the parameters of the inference network. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora.