Improved Gradient-Based Optimization Over Discrete Distributions
This work addresses a challenge in machine learning for researchers and practitioners dealing with discrete optimization, though it appears incremental as it builds on existing estimators.
The paper tackled the problem of gradient estimation for optimizing expectations over discrete distributions, showing that the Gumbel-Softmax estimator is biased and proposing methods to reduce bias, which led to improved performance in variational inference and binary optimization tasks.
In many applications we seek to maximize an expectation with respect to a distribution over discrete variables. Estimating gradients of such objectives with respect to the distribution parameters is a challenging problem. We analyze existing solutions including finite-difference (FD) estimators and continuous relaxation (CR) estimators in terms of bias and variance. We show that the commonly used Gumbel-Softmax estimator is biased and propose a simple method to reduce it. We also derive a simpler piece-wise linear continuous relaxation that also possesses reduced bias. We demonstrate empirically that reduced bias leads to a better performance in variational inference and on binary optimization tasks.