OCLGMLJul 4, 2018

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

arXiv:1807.01695v2672 citations
Originality Highly original
AI Analysis

This work addresses computational efficiency in non-convex optimization for machine learning and AI practitioners, offering significant improvements over existing methods.

The paper tackles non-convex stochastic optimization by proposing SPIDER, a technique that reduces computational cost for tracking deterministic quantities, and applies it to first-order and zeroth-order methods. It achieves record-breaking gradient computation costs, such as O(min(n^{1/2} ε^{-2}, ε^{-3})) for finding ε-approximate first-order stationary points, and nearly matches algorithmic lower bounds in some settings.

In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SFO\textsuperscript{+}, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. In special, we prove that the SPIDER-SFO and SPIDER-SFO\textsuperscript{+} algorithms achieve a record-breaking gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} ε^{-2}, ε^{-3} ) \right)$ for finding an $ε$-approximate first-order and $\tilde{\mathcal{O}}\left( \min( n^{1/2} ε^{-2}+ε^{-2.5}, ε^{-3} ) \right)$ for finding an $(ε, \mathcal{O}(ε^{0.5}))$-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting. For stochastic zeroth-order method, we prove a cost of $\mathcal{O}( d \min( n^{1/2} ε^{-2}, ε^{-3}) )$ which outperforms all existing results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes