OCLGMLJan 22, 2019

Finite-Sum Smooth Optimization with SARAH

arXiv:1901.07648v248 citations
AI Analysis

This work addresses the computational efficiency of stochastic optimization for machine learning practitioners, providing a near-optimal algorithm for nonconvex problems, though it is incremental as it builds on existing SARAH methods.

The paper tackles the problem of optimizing finite-sum smooth nonconvex functions by analyzing a modified SARAH algorithm, achieving a total complexity that matches the known lower bound of Ω(√n/ε) up to a constant factor for n ≤ O(ε⁻²). For convex cases, it proposes SARAH++ with sublinear or linear convergence and demonstrates improved performance in experiments.

The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $Ω(\sqrt{n}/ε)$ for $n \leq \mathcal{O}(ε^{-2})$ where $ε$ denotes the attained accuracy $\mathbb{E}[ \|\nabla F(\tilde{w})\|^2] \leq ε$ for the outputted approximation $\tilde{w}$ (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when $n \leq \mathcal{O}(ε^{-2})$ for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes