LGOCMLMar 23, 2021

Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

arXiv:2103.12243v117 citations
AI Analysis

This work addresses variance reduction for finite-sum optimization and sampling, offering incremental improvements in convergence rates for machine learning practitioners.

The paper tackles the problem of reducing gradient estimator variance in stochastic optimization and sampling by proposing Avare, an adaptive importance sampling algorithm, which achieves O(T^{2/3}) dynamic regret for SGD and O(T^{5/6}) for SGLD with decreasing step-sizes.

Reducing the variance of the gradient estimator is known to improve the convergence rate of stochastic gradient-based optimization and sampling algorithms. One way of achieving variance reduction is to design importance sampling strategies. Recently, the problem of designing such schemes was formulated as an online learning problem with bandit feedback, and algorithms with sub-linear static regret were designed. In this work, we build on this framework and propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Under standard technical conditions, we show that Avare achieves $\mathcal{O}(T^{2/3})$ and $\mathcal{O}(T^{5/6})$ dynamic regret for SGD and SGLD respectively when run with $\mathcal{O}(1/t)$ step sizes. We achieve this dynamic regret bound by leveraging our knowledge of the dynamics defined by the algorithm, and combining ideas from online learning and variance-reduced stochastic optimization. We validate empirically the performance of our algorithm and identify settings in which it leads to significant improvements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes