OC DS LGJun 25, 2019

Complexity of Highly Parallel Non-Smooth Convex Optimization

Sébastien Bubeck, Qijia Jiang, Yin Tat Lee, Yuanzhi Li, Aaron Sidford

arXiv:1906.10655v219.862 citations

Originality Highly original

AI Analysis

This addresses a foundational problem in optimization theory for researchers, providing improved lower bounds and a potentially optimal method in the parallel setting.

The paper tackles the problem of parallel non-smooth convex optimization, showing that gradient descent is optimal only up to about sqrt(d) rounds with a highly parallel gradient oracle, and proposes a new method with improved complexity beyond that point.

A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension $d$. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel. We show that in this case gradient descent is optimal only up to $\tilde{O}(\sqrt{d})$ rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to $d^{1/3}$ rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after $\sqrt{d}$ rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization.

View on arXiv PDF

Similar