LGOct 31, 2017

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

arXiv:1711.00123v3314 citationsHas Code
Originality Highly original
AI Analysis

This addresses a key bottleneck in deep learning and reinforcement learning for scenarios where functions are non-differentiable or unknown, offering a general solution with potential broad impact.

The paper tackles the problem of high-variance or biased gradient estimation in black-box optimization by introducing a framework for learning low-variance, unbiased gradient estimators, demonstrating it for training discrete latent-variable models and extending it to reinforcement learning algorithms.

Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

Code Implementations7 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes