ML CV LGJul 14, 2020

Relaxed-Responsibility Hierarchical Discrete VAEs

Matthew Willetts, Xenia Miscouridou, Stephen Roberts, Chris Holmes

arXiv:2007.07307v27.55 citations

Originality Highly original

AI Analysis

This work addresses a key bottleneck in deep generative modeling for researchers and practitioners, enabling more efficient and stable training of hierarchical discrete VAEs, though it is incremental in refining existing methods.

The paper tackles the challenge of training hierarchical discrete Variational Autoencoders (VAEs) by introducing Relaxed-Responsibility Vector-Quantisation, a novel parameterisation method that improves performance and stability, achieving state-of-the-art bits-per-dim results on standard datasets with up to 32 latent layers.

Successfully training Variational Autoencoders (VAEs) with a hierarchy of discrete latent variables remains an area of active research. Vector-Quantised VAEs are a powerful approach to discrete VAEs, but naive hierarchical extensions can be unstable when training. Leveraging insights from classical methods of inference we introduce \textit{Relaxed-Responsibility Vector-Quantisation}, a novel way to parameterise discrete latent variables, a refinement of relaxed Vector-Quantisation that gives better performance and more stable training. This enables a novel approach to hierarchical discrete variational autoencoders with numerous layers of latent variables (here up to 32) that we train end-to-end. Within hierarchical probabilistic deep generative models with discrete latent variables trained end-to-end, we achieve state-of-the-art bits-per-dim results for various standard datasets. % Unlike discrete VAEs with a single layer of latent variables, we can produce samples by ancestral sampling: it is not essential to train a second autoregressive generative model over the learnt latent representations to then sample from and then decode. % Moreover, that latter approach in these deep hierarchical models would require thousands of forward passes to generate a single sample. Further, we observe different layers of our model become associated with different aspects of the data.

View on arXiv PDF

Similar