Stochastic Variational Optimization
This work provides incremental insights for researchers in optimization methods, focusing on parallelizable gradient estimation techniques.
The paper tackled the problem of parallelizing stochastic gradient descent by analyzing Variational Optimization and its approximations, concluding that Directional Derivatives are preferable for differentiable objectives.
Variational Optimization forms a differentiable upper bound on an objective. We show that approaches such as Natural Evolution Strategies and Gaussian Perturbation, are special cases of Variational Optimization in which the expectations are approximated by Gaussian sampling. These approaches are of particular interest because they are parallelizable. We calculate the approximate bias and variance of the corresponding gradient estimators and demonstrate that using antithetic sampling or a baseline is crucial to mitigate their problems. We contrast these methods with an alternative parallelizable method, namely Directional Derivatives. We conclude that, for differentiable objectives, using Directional Derivatives is preferable to using Variational Optimization to perform parallel Stochastic Gradient Descent.