OCSYMLDec 5, 2016

Decentralized Frank-Wolfe Algorithm for Convex and Non-convex Problems

arXiv:1612.01216v3108 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable decentralized optimization for machine learning practitioners, offering an incremental improvement over existing methods by reducing computational complexity in constrained settings.

The paper tackles the computational inefficiency of projection steps in decentralized optimization for high-dimensional constrained problems by proposing a projection-free decentralized Frank-Wolfe algorithm, achieving convergence rates of O(1/t) for convex, O(1/t^2) for strongly convex, and O(1/√t) for non-convex objectives.

Decentralized optimization algorithms have received much attention due to the recent advances in network information processing. However, conventional decentralized algorithms based on projected gradient descent are incapable of handling high dimensional constrained problems, as the projection step becomes computationally prohibitive to compute. To address this problem, this paper adopts a projection-free optimization approach, a.k.a.~the Frank-Wolfe (FW) or conditional gradient algorithm. We first develop a decentralized FW (DeFW) algorithm from the classical FW algorithm. The convergence of the proposed algorithm is studied by viewing the decentralized algorithm as an inexact FW algorithm. Using a diminishing step size rule and letting $t$ be the iteration number, we show that the DeFW algorithm's convergence rate is ${\cal O}(1/t)$ for convex objectives; is ${\cal O}(1/t^2)$ for strongly convex objectives with the optimal solution in the interior of the constraint set; and is ${\cal O}(1/\sqrt{t})$ towards a stationary point for smooth but non-convex objectives. We then show that a consensus-based DeFW algorithm meets the above guarantees with two communication rounds per iteration. Furthermore, we demonstrate the advantages of the proposed DeFW algorithm on low-complexity robust matrix completion and communication efficient sparse learning. Numerical results on synthetic and real data are presented to support our findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes