Zero-Variance Gradients for Variational Autoencoders
This work addresses a key bottleneck in training deep generative models for researchers and practitioners, offering a novel approach that combines analytical stability with deep architectures, though it is incremental in its application to VAEs.
The paper tackles the problem of gradient variance in training Variational Autoencoders (VAEs) by introducing Silent Gradients, a method that uses specific decoder architectures to compute gradients with zero variance, leading to improved performance over existing estimators across multiple datasets.
Training deep generative models like Variational Autoencoders (VAEs) is often hindered by the need to backpropagate gradients through the stochastic sampling of their latent variables, a process that inherently introduces estimation variance, which can slow convergence and degrade performance. In this paper, we propose a new perspective that sidesteps this problem, which we call Silent Gradients. Instead of improving stochastic estimators, we leverage specific decoder architectures to analytically compute the expected ELBO, yielding a gradient with zero variance. We first provide a theoretical foundation for this method and demonstrate its superiority over existing estimators in a controlled setting with a linear decoder. To generalize our approach for practical use with complex, expressive decoders, we introduce a novel training dynamic that uses the exact, zero-variance gradient to guide the early stages of encoder training before annealing to a standard stochastic estimator. Our experiments show that this technique consistently improves the performance of established baselines, including reparameterization, Gumbel-Softmax, and REINFORCE, across multiple datasets. This work opens a new direction for training generative models by combining the stability of analytical computation with the expressiveness of deep, nonlinear architecture.