Dirichlet Variational Autoencoder for Text Modeling
This work addresses text modeling for natural language processing by proposing an incremental improvement to VAEs with topic awareness.
The authors tackled the problem of text modeling by introducing a Dirichlet variational autoencoder that explicitly models topic information, resulting in improved text reconstruction and higher classification accuracies on learned representations across four datasets.
We introduce an improved variational autoencoder (VAE) for text modeling with topic information explicitly modeled as a Dirichlet latent variable. By providing the proposed model topic awareness, it is more superior at reconstructing input texts. Furthermore, due to the inherent interactions between the newly introduced Dirichlet variable and the conventional multivariate Gaussian variable, the model is less prone to KL divergence vanishing. We derive the variational lower bound for the new model and conduct experiments on four different data sets. The results show that the proposed model is superior at text reconstruction across the latent space and classifications on learned representations have higher test accuracies.