LGFeb 8, 2021

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison

arXiv:2102.04509v228.5121 citationsHas Code

Originality Highly original

AI Analysis

This work provides a more efficient and scalable sampling method for discrete probabilistic models, which is beneficial for researchers and practitioners working with such models, especially in deep energy-based model training.

This paper introduces a scalable approximate sampling strategy for probabilistic models with discrete variables, utilizing gradients of the likelihood function to propose updates in a Metropolis-Hastings sampler. The approach empirically outperforms generic samplers in various models (Ising, Potts, RBMs, FHMMs) and also surpasses variational auto-encoders and existing energy-based models when training deep energy-based models on high-dimensional discrete data.

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.

View on arXiv PDF Code

Similar