Lost Relatives of the Gumbel Trick
This work addresses computational bottlenecks in discrete graphical models for researchers in machine learning and statistics, offering incremental improvements over the Gumbel trick.
The paper tackles the problem of sampling from discrete probability distributions and estimating partition functions by deriving a family of methods related to the Gumbel trick, showing that these new methods have superior properties in several settings with minimal additional computational cost, such as proving new bounds and enabling sequential samplers.
The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each time solving for the most likely configuration. We derive an entire family of related methods, of which the Gumbel trick is one member, and show that the new methods have superior properties in several settings with minimal additional computational cost. In particular, for the Gumbel trick to yield computational benefits for discrete graphical models, Gumbel perturbations on all configurations are typically replaced with so-called low-rank perturbations. We show how a subfamily of our new methods adapts to this setting, proving new upper and lower bounds on the log partition function and deriving a family of sequential samplers for the Gibbs distribution. Finally, we balance the discussion by showing how the simpler analytical form of the Gumbel trick enables additional theoretical results.