LGFeb 2, 2022

DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

arXiv:2202.01268v221 citations
Originality Incremental advance
AI Analysis

This addresses efficiency and practicality in federated learning by reducing synchronization and communication overhead, though it is incremental as it builds on existing methods like MARINA.

The paper tackles nonconvex distributed optimization by introducing DASHA, a family of methods that improve theoretical oracle and communication complexity over prior state-of-the-art, achieving optimal gradient computations of O(√m/(ε√n)) and O(σ/(ε^(3/2)n)) for finite-sum and expectation cases, respectively, while maintaining communication complexity of O(d/(ε√n)).

We develop and analyze DASHA: a new family of methods for nonconvex distributed optimization problems. When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020). In particular, to achieve an epsilon-stationary point, and considering the random sparsifier RandK as an example, our methods compute the optimal number of gradients $\mathcal{O}\left(\frac{\sqrt{m}}{\varepsilon\sqrt{n}}\right)$ and $\mathcal{O}\left(\fracσ{\varepsilon^{3/2}n}\right)$ in finite-sum and expectation form cases, respectively, while maintaining the SOTA communication complexity $\mathcal{O}\left(\frac{d}{\varepsilon \sqrt{n}}\right)$. Furthermore, unlike MARINA, the new methods DASHA, DASHA-PAGE and DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for federated learning. We extend our results to the case when the functions satisfy the Polyak-Lojasiewicz condition. Finally, our theory is corroborated in practice: we see a significant improvement in experiments with nonconvex classification and training of deep learning models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes