OCLGMLMar 23, 2021

Stochastic Reweighted Gradient Descent

arXiv:2103.12293v110 citations
AI Analysis

This work addresses a bottleneck in optimization algorithms for machine learning practitioners, offering a more efficient alternative to existing variance-reduced methods, though it is incremental as it does not achieve the linear convergence rates of control variates methods.

The authors tackled the problem of variance reduction in finite-sum optimization by proposing SRG, an importance-sampling-based algorithm that avoids memory overhead and full gradient computations, showing it provably outperforms SGD in the strongly-convex case.

Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they require (SVRG/SARAH) are manageable. A promising approach to achieving variance reduction while avoiding these drawbacks is the use of importance sampling instead of control variates. While many such methods have been proposed in the literature, directly proving that they improve the convergence of the resulting optimization algorithm has remained elusive. In this work, we propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient). We analyze the convergence of SRG in the strongly-convex case and show that, while it does not recover the linear rate of control variates methods, it provably outperforms SGD. We pay particular attention to the time and memory overhead of our proposed method, and design a specialized red-black tree allowing its efficient implementation. Finally, we present empirical results to support our findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes