OCLGFeb 19, 2019

Stochastic Conditional Gradient++

arXiv:1902.06992v424 citations
AI Analysis

This addresses optimization problems where stochasticity depends on the evaluation point, offering efficient algorithms with proven optimal convergence rates for researchers in machine learning and optimization.

The paper tackles non-oblivious stochastic optimization by developing Stochastic Frank-Wolfe++ (SFW++), which converges to an ε-first order stationary point using O(1/ε³) stochastic gradients, with improved rates to O(1/ε²) for convex and DR-submodular functions, achieving optimal rates in some cases.

In this paper, we consider the general non-oblivious stochastic optimization where the underlying stochasticity may change during the optimization procedure and depends on the point at which the function is evaluated. We develop Stochastic Frank-Wolfe++ ($\text{SFW}{++} $), an efficient variant of the conditional gradient method for minimizing a smooth non-convex function subject to a convex body constraint. We show that $\text{SFW}{++} $ converges to an $ε$-first order stationary point by using $O(1/ε^3)$ stochastic gradients. Once further structures are present, $\text{SFW}{++}$'s theoretical guarantees, in terms of the convergence rate and quality of its solution, improve. In particular, for minimizing a convex function, $\text{SFW}{++} $ achieves an $ε$-approximate optimum while using $O(1/ε^2)$ stochastic gradients. It is known that this rate is optimal in terms of stochastic gradient evaluations. Similarly, for maximizing a monotone continuous DR-submodular function, a slightly different form of $\text{SFW}{++} $, called Stochastic Continuous Greedy++ ($\text{SCG}{++} $), achieves a tight $[(1-1/e)\text{OPT} -ε]$ solution while using $O(1/ε^2)$ stochastic gradients. Through an information theoretic argument, we also prove that $\text{SCG}{++} $'s convergence rate is optimal. Finally, for maximizing a non-monotone continuous DR-submodular function, we can achieve a $[(1/e)\text{OPT} -ε]$ solution by using $O(1/ε^2)$ stochastic gradients. We should highlight that our results and our novel variance reduction technique trivially extend to the standard and easier oblivious stochastic optimization settings for (non-)covex and continuous submodular settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes