LGOCOct 7, 2020

Optimal Gradient Compression for Distributed and Federated Learning

arXiv:2010.03246v172 citations
Originality Highly original
AI Analysis

This work addresses scalability issues in distributed and federated learning by improving communication efficiency, offering incremental advancements in compression techniques.

The paper tackles the communication bottleneck in distributed and federated learning by analyzing the trade-off between bits and compression error, introducing efficient compression operators like Sparse Dithering and Spherical Compression that achieve tight lower bounds and significantly outperform state-of-the-art methods.

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques, in the form of sparsification, quantization, or low-rank approximation. Since compression is a lossy, or inexact, process, the iteration complexity is typically worsened; but the total communication complexity can improve significantly, possibly leading to large computation time savings. In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error. We perform both worst-case and average-case analysis, providing tight lower bounds. In the worst-case analysis, we introduce an efficient compression operator, Sparse Dithering, which is very close to the lower bound. In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound. Thus, our new compression schemes significantly outperform the state of the art. We conduct numerical experiments to illustrate this improvement.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes