LGDCMLAug 13, 2020

Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

arXiv:2008.05823v317 citations
Originality Highly original
AI Analysis

This addresses a critical bottleneck in communication-efficient distributed machine learning for large neural networks, offering a novel solution to improve training speed without sacrificing accuracy.

The paper tackles the gradient mismatch problem in distributed training with compressed gradients, showing that local error feedback can degrade performance compared to full-precision training. The proposed step-ahead and error averaging methods resolve this issue, enabling faster training with common compression schemes than full-precision training and local error feedback, without performance loss.

Although the distributed machine learning methods can speed up the training of large deep neural networks, the communication cost has become the non-negligible bottleneck to constrain the performance. To address this challenge, the gradient compression based communication-efficient distributed learning methods were designed to reduce the communication cost, and more recently the local error feedback was incorporated to compensate for the corresponding performance loss. However, in this paper, we will show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training and can lead to degraded performance compared with full-precision training. To solve this critical problem, we propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis. Both our theoretical and empirical results show that our new methods can handle the "gradient mismatch" problem. The experimental results show that we can even train faster with common gradient compression schemes than both the full-precision training and local error feedback regarding the training epochs and without performance loss.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes