LGOCNov 26, 2021

Random-reshuffled SARAH does not need a full gradient computations

arXiv:2111.13322v211 citations
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck in variance-reduced stochastic optimization for machine learning practitioners, representing an incremental improvement over existing methods.

The paper tackles the computational inefficiency of the SARAH algorithm by eliminating the need for full gradient computations, achieving this through a randomized reshuffling strategy that aggregates stochastic gradients per epoch, with numerical experiments demonstrating its efficiency.

The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance reduced variant of the Stochastic Gradient Descent (SGD) algorithm that needs a gradient of the objective function from time to time. In this paper, we remove the necessity of a full gradient computation. This is achieved by using a randomized reshuffling strategy and aggregating stochastic gradients obtained in each epoch. The aggregated stochastic gradients serve as an estimate of a full gradient in the SARAH algorithm. We provide a theoretical analysis of the proposed approach and conclude the paper with numerical experiments that demonstrate the efficiency of this approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes