MLLGMay 11, 2022

On Distributed Adaptive Optimization with Gradient Compression

arXiv:2205.05632v134 citationsh-index: 14
Originality Incremental advance
AI Analysis

This is an incremental improvement for distributed machine learning training, reducing communication overhead without sacrificing performance.

The paper tackles the problem of high communication costs in distributed adaptive optimization by proposing COMP-AMS, a framework that uses gradient compression with error feedback, achieving the same test accuracy as full-gradient AMSGrad while saving substantial communication.

We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process. Our convergence analysis of COMP-AMS shows that such compressed gradient averaging strategy yields same convergence rate as standard AMSGrad, and also exhibits the linear speedup effect w.r.t. the number of local workers. Compared with recently proposed protocols on distributed adaptive methods, COMP-AMS is simple and convenient. Numerical experiments are conducted to justify the theoretical findings, and demonstrate that the proposed method can achieve same test accuracy as the full-gradient AMSGrad with substantial communication savings. With its simplicity and efficiency, COMP-AMS can serve as a useful distributed training framework for adaptive gradient methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes