OCLGMLJun 19, 2020

How Does Momentum Help Frank Wolfe?

arXiv:2006.11116v17 citations
Originality Incremental advance
AI Analysis

This work addresses optimization efficiency for machine learning practitioners, offering a competitive alternative to standard FW with minimal computational overhead.

The paper investigates the role of momentum in Frank Wolfe (FW) algorithms, showing that while momentum is generally ineffective for FW, it accelerates convergence to a rate of $ ilde{\cal O}( rac{1}{k^2})$ on specific constraint sets, with numerical experiments validating these findings.

We unveil the connections between Frank Wolfe (FW) type algorithms and the momentum in Accelerated Gradient Methods (AGM). On the negative side, these connections illustrate why momentum is unlikely to be effective for FW type algorithms. The encouraging message behind this link, on the other hand, is that momentum is useful for FW on a class of problems. In particular, we prove that a momentum variant of FW, that we term accelerated Frank Wolfe (AFW), converges with a faster rate $\tilde{\cal O}(\frac{1}{k^2})$ on certain constraint sets despite the same ${\cal O}(\frac{1}{k})$ rate as FW on general cases. Given the possible acceleration of AFW at almost no extra cost, it is thus a competitive alternative to FW. Numerical experiments on benchmarked machine learning tasks further validate our theoretical findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes