ML LG STJun 10, 2020

Variance reduction for Random Coordinate Descent-Langevin Monte Carlo

arXiv:2006.06068v42.71 citations

Originality Incremental advance

AI Analysis

This addresses the computational bottleneck in Bayesian statistics and machine learning for high-dimensional sampling, offering a practical improvement over existing methods.

The paper tackles the problem of high computational cost in Langevin Monte Carlo (LMC) methods for sampling from log-concave distributions by introducing a variance reduction technique called Randomized Coordinates Averaging Descent (RCAD), which reduces variance in random gradient approximations, allowing RCAD-O-LMC and RCAD-U-LMC to converge within the same number of iterations as classical LMC while lowering per-iteration cost.

Sampling from a log-concave distribution function is one core problem that has wide applications in Bayesian statistics and machine learning. While most gradient free methods have slow convergence rate, the Langevin Monte Carlo (LMC) that provides fast convergence requires the computation of gradients. In practice one uses finite-differencing approximations as surrogates, and the method is expensive in high-dimensions. A natural strategy to reduce computational cost in each iteration is to utilize random gradient approximations, such as random coordinate descent (RCD) or simultaneous perturbation stochastic approximation (SPSA). We show by a counter-example that blindly applying RCD does not achieve the goal in the most general setting. The high variance induced by the randomness means a larger number of iterations are needed, and this balances out the saving in each iteration. We then introduce a new variance reduction approach, termed Randomized Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC. The methods are termed RCAD-O-LMC and RCAD-U-LMC respectively. The methods still sit in the random gradient approximation framework, and thus the computational cost in each iteration is low. However, by employing RCAD, the variance is reduced, so the methods converge within the same number of iterations as the classical overdamped and underdamped LMC. This leads to a computational saving overall.

View on arXiv PDF

Similar