OCDSLGJun 25, 2019

Complexity of Highly Parallel Non-Smooth Convex Optimization

arXiv:1906.10655v262 citations
Originality Highly original
AI Analysis

This addresses a foundational problem in optimization theory for researchers, providing improved lower bounds and a potentially optimal method in the parallel setting.

The paper tackles the problem of parallel non-smooth convex optimization, showing that gradient descent is optimal only up to about sqrt(d) rounds with a highly parallel gradient oracle, and proposes a new method with improved complexity beyond that point.

A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension $d$. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel. We show that in this case gradient descent is optimal only up to $\tilde{O}(\sqrt{d})$ rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to $d^{1/3}$ rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after $\sqrt{d}$ rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes