Nonparametric Variational Auto-encoders for Hierarchical Representation Learning
This work addresses the problem of limited representation flexibility in VAEs for researchers in machine learning, offering an incremental improvement by integrating nonparametric priors.
The paper tackled the limitation of variational autoencoders (VAEs) using simple priors by proposing hierarchical nonparametric VAEs with tree-structured Bayesian priors to enable flexible latent representations, resulting in improved clustering accuracy and generalization in video representation learning.
The recently developed variational autoencoders (VAEs) have proved to be an effective confluence of the rich representational power of neural networks with Bayesian methods. However, most work on VAEs use a rather simple prior over the latent variables such as standard normal distribution, thereby restricting its applications to relatively simple phenomena. In this work, we propose hierarchical nonparametric variational autoencoders, which combines tree-structured Bayesian nonparametric priors with VAEs, to enable infinite flexibility of the latent representation space. Both the neural parameters and Bayesian priors are learned jointly using tailored variational inference. The resulting model induces a hierarchical structure of latent semantic concepts underlying the data corpus, and infers accurate representations of data instances. We apply our model in video representation learning. Our method is able to discover highly interpretable activity hierarchies, and obtain improved clustering accuracy and generalization capacity based on the learned rich representations.