Hierarchical Quantized Autoencoders
This addresses a specific bottleneck in image compression for applications requiring low bitrates, but it is incremental as it builds on existing VQ-VAE methods.
The paper tackles the problem of maintaining perceptual quality and abstract features in neural network-based lossy image compression at very low bitrates by introducing a hierarchical VQ-VAE approach, resulting in high-compression images with retained semantic features as evaluated on CelebA and MNIST datasets.
Despite progress in training neural networks for lossy image compression, current approaches fail to maintain both perceptual quality and abstract features at very low bitrates. Encouraged by recent success in learning discrete representations with Vector Quantized Variational Autoencoders (VQ-VAEs), we motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that the combination of stochastic quantization and hierarchical latent structure aids likelihood-based image compression. This leads us to introduce a novel objective for training hierarchical VQ-VAEs. Our resulting scheme produces a Markovian series of latent variables that reconstruct images of high-perceptual quality which retain semantically meaningful features. We provide qualitative and quantitative evaluations on the CelebA and MNIST datasets.