LGDCOCMay 28, 2022

Efficient-Adam: Communication-Efficient Distributed Adam

arXiv:2205.14473v234 citationsh-index: 88
AI Analysis

This work addresses communication bottlenecks in distributed deep learning training, offering a practical improvement for large-scale model optimization, though it is incremental as it builds on existing Adam methods.

The paper tackles the problem of high communication costs in distributed Adam optimization for nonconvex settings by proposing Efficient-Adam, which incorporates two-way quantization and error feedback to reduce communication, and it demonstrates effectiveness through experiments on vision and language tasks with theoretical guarantees.

Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex optimization, such as training deep learning models. However, their communication complexity on finding $\varepsilon$-stationary points has rarely been analyzed in the nonconvex setting. In this work, we present a novel communication-efficient distributed Adam in the parameter-server model for stochastic nonconvex optimization, dubbed {\em Efficient-Adam}. Specifically, we incorporate a two-way quantization scheme into Efficient-Adam to reduce the communication cost between the workers and server. Simultaneously, we adopt a two-way error feedback strategy to reduce the biases caused by the two-way quantization on both the server and workers, respectively. In addition, we establish the iteration complexity for the proposed Efficient-Adam with a class of quantization operators, and further characterize its communication complexity between the server and workers when an $\varepsilon$-stationary point is achieved. Finally, we apply Efficient-Adam to solve a toy stochastic convex optimization problem and train deep learning models on real-world vision and language tasks. Extensive experiments together with a theoretical guarantee justify the merits of Efficient Adam.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes