LGFeb 5, 2016

Variance-Reduced and Projection-Free Stochastic Optimization

arXiv:1602.02101v2179 citations
Originality Incremental advance
AI Analysis

This work addresses the understudied area of stochastic Frank-Wolfe optimization, offering faster convergence for machine learning applications with structured constraints, though it is incremental as it builds on existing variance reduction techniques.

The paper tackles the problem of improving the efficiency of stochastic Frank-Wolfe optimization algorithms by proposing two variants that reduce the number of stochastic gradient evaluations needed for accuracy, achieving improvements from O(1/ε) to O(ln(1/ε)) for smooth and strongly convex functions and from O(1/ε^2) to O(1/ε^1.5) for smooth and Lipschitz functions.

The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the gradient descent counterpart. In this work, leveraging a recent variance reduction technique, we propose two stochastic Frank-Wolfe variants which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve $1-ε$ accuracy. For example, we improve from $O(\frac{1}ε)$ to $O(\ln\frac{1}ε)$ if the objective function is smooth and strongly convex, and from $O(\frac{1}{ε^2})$ to $O(\frac{1}{ε^{1.5}})$ if the objective function is smooth and Lipschitz. The theoretical improvement is also observed in experiments on real-world datasets for a multiclass classification application.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes