OCJan 25, 2021
Complexity of Linear Minimization and Projection on Some SetsCyrille W. Combettes, Sebastian Pokutta
The Frank-Wolfe algorithm is a method for constrained optimization that relies on linear minimizations, as opposed to projections. Therefore, a motivation put forward in a large body of work on the Frank-Wolfe algorithm is the computational advantage of solving linear minimizations instead of projections. However, the discussions supporting this advantage are often too succinct or incomplete. In this paper, we review the complexity bounds for both tasks on several sets commonly used in optimization. Projection methods onto the $\ell_p$-ball, $p\in\left]1,2\right[\cup\left]2,+\infty\right[$, and the Birkhoff polytope are also proposed.
OCSep 29, 2020
Projection-Free Adaptive Gradients for Large-Scale OptimizationCyrille W. Combettes, Christoph Spiegel, Sebastian Pokutta
The complexity in large-scale optimization can lie in both handling the objective function and handling the constraint set. In this respect, stochastic Frank-Wolfe algorithms occupy a unique position as they alleviate both computational burdens, by querying only approximate first-order information from the objective and by maintaining feasibility of the iterates without using projections. In this paper, we improve the quality of their first-order information by blending in adaptive gradients. We derive convergence rates and demonstrate the computational advantage of our method over the state-of-the-art stochastic Frank-Wolfe algorithms on both convex and nonconvex objectives. The experiments further show that our method can improve the performance of adaptive gradient algorithms for constrained optimization.
OCMar 13, 2020
Boosting Frank-Wolfe by Chasing GradientsCyrille W. Combettes, Sebastian Pokutta
The Frank-Wolfe algorithm has become a popular first-order optimization algorithm for it is simple and projection-free, and it has been successfully applied to a variety of real-world problems. Its main drawback however lies in its convergence rate, which can be excessively slow due to naive descent directions. We propose to speed up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient via a subroutine. This subroutine chases the negative gradient direction in a matching pursuit-style while still preserving the projection-free property. Although the approach is reasonably natural, it produces very significant results. We derive convergence rates $\mathcal{O}(1/t)$ to $\mathcal{O}(e^{-ωt})$ of our method and we demonstrate its competitive advantage both per iteration and in CPU time over the state-of-the-art in a series of computational experiments.
OCNov 11, 2019
Revisiting the Approximate Carathéodory Problem via the Frank-Wolfe AlgorithmCyrille W. Combettes, Sebastian Pokutta
The approximate Carathéodory theorem states that given a compact convex set $\mathcal{C}\subset\mathbb{R}^n$ and $p\in\left[2,+\infty\right[$, each point $x^*\in\mathcal{C}$ can be approximated to $ε$-accuracy in the $\ell_p$-norm as the convex combination of $\mathcal{O}(pD_p^2/ε^2)$ vertices of $\mathcal{C}$, where $D_p$ is the diameter of $\mathcal{C}$ in the $\ell_p$-norm. A solution satisfying these properties can be built using probabilistic arguments or by applying mirror descent to the dual problem. We revisit the approximate Carathéodory problem by solving the primal problem via the Frank-Wolfe algorithm, providing a simplified analysis and leading to an efficient practical method. Furthermore, improved cardinality bounds are derived naturally using existing convergence rates of the Frank-Wolfe algorithm in different scenarios, when $x^*$ is in the interior of $\mathcal{C}$, when $x^*$ is the convex combination of a subset of vertices with small diameter, or when $\mathcal{C}$ is uniformly convex. We also propose cardinality bounds when $p\in\left[1,2\right[\cup\{+\infty\}$ via a nonsmooth variant of the algorithm. Lastly, we address the problem of finding sparse approximate projections onto $\mathcal{C}$ in the $\ell_p$-norm, $p\in\left[1,+\infty\right]$.
OCApr 28, 2019
Blended Matching PursuitCyrille W. Combettes, Sebastian Pokutta
Matching pursuit algorithms are an important class of algorithms in signal processing and machine learning. We present a blended matching pursuit algorithm, combining coordinate descent-like steps with stronger gradient descent steps, for minimizing a smooth convex function over a linear space spanned by a set of atoms. We derive sublinear to linear convergence rates according to the smoothness and sharpness orders of the function and demonstrate computational superiority of our approach. In particular, we derive linear rates for a wide class of non-strongly convex functions, and we demonstrate in experiments that our algorithm enjoys very fast rates of convergence and wall-clock speed while maintaining a sparsity of iterates very comparable to that of the (much slower) orthogonal matching pursuit.