Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization
This work addresses optimization challenges in machine learning and related fields by improving convergence rates for composite minimization and enabling faster parallel processing, though it is incremental as it builds on existing Frank-Wolfe and stochastic methods.
The paper tackles the problem of minimizing the sum of two convex functions, where one has Lipschitz gradients and stochastic access, by developing a Bregman-type algorithm with accelerated convergence to a ball around the minimum, and extends this to an accelerated Frank-Wolfe variant achieving an ε primal-dual gap in Õ(1/√ε) iterations under parallelization with O(1/√ε) computing units.
We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an $ε$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrtε)$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O(1/\sqrtε)$ computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.