NEJul 19, 2016

Stochastic Backpropagation through Mixture Density Distributions

arXiv:1607.05690v148 citations
Originality Incremental advance
AI Analysis

This solves a technical bottleneck for researchers and practitioners in machine learning who need to train models with mixture density distributions, making it incremental by extending existing reparameterization methods.

The paper tackles the problem of backpropagating gradients through mixture density distributions, which was previously difficult due to the discrete nature of mixture weights, and introduces an alternative transform that enables unbiased gradient estimation for training models like variational autoencoders with mixture-distributed latent variables.

The ability to backpropagate stochastic gradients through continuous latent distributions has been crucial to the emergence of variational autoencoders and stochastic gradient variational Bayes. The key ingredient is an unbiased and low-variance way of estimating gradients with respect to distribution parameters from gradients evaluated at distribution samples. The "reparameterization trick" provides a class of transforms yielding such estimators for many continuous distributions, including the Gaussian and other members of the location-scale family. However the trick does not readily extend to mixture density models, due to the difficulty of reparameterizing the discrete distribution over mixture weights. This report describes an alternative transform, applicable to any continuous multivariate distribution with a differentiable density function from which samples can be drawn, and uses it to derive an unbiased estimator for mixture density weight derivatives. Combined with the reparameterization trick applied to the individual mixture components, this estimator makes it straightforward to train variational autoencoders with mixture-distributed latent variables, or to perform stochastic variational inference with a mixture density variational posterior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes