LGCVMay 16, 2022

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

arXiv:2205.07547v2103 citationsh-index: 27
AI Analysis

This addresses codebook underutilization in discrete representation learning for researchers in generative models, though it appears incremental as it modifies training schemes rather than introducing a new paradigm.

The paper tackles the codebook collapse problem in VQ-VAE by proposing SQ-VAE, a training scheme with stochastic dequantization and quantization that self-anneals to deterministic quantization, improving codebook utilization and showing superiority over VAE and VQ-VAE in vision and speech tasks.

One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes