OCDCLGMay 30, 2022

Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

arXiv:2205.15136v114 citationsh-index: 50
Originality Highly original
AI Analysis

This work addresses a long-standing open problem in distributed optimization by providing optimal complexity bounds for communication and computation, which is significant for large-scale machine learning and data analysis applications.

The paper tackles structured convex optimization with additive objectives by proposing an inexact accelerated gradient sliding method that skips gradient computations for one component while achieving optimal complexity bounds, specifically O(√(L_p/μ)) and O(√(L_q/μ)) gradient calls. It applies this to distributed optimization under function similarity, achieving lower bounds on both communication and local gradient calls for the first time.

We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of $p$ and $q$, that is, $\mathcal{O}(\sqrt{L_p/μ})$ and $\mathcal{O}(\sqrt{L_q/μ})$, respectively. This result is much sharper than the classic black-box complexity $\mathcal{O}(\sqrt{(L_p+L_q)/μ})$, especially when the difference between $L_q$ and $L_q$ is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The distributed algorithm achieves for the first time lower complexity bounds on {\it both} communication and local gradient calls, with the former having being a long-standing open problem. Finally the method is extended to distributed saddle-problems (under function similarity) by means of solving a class of variational inequalities, achieving lower communication and computation complexity bounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes