LG DC IT OC MLJan 28, 2019

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

arXiv:1901.09671v116.559 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses delays in distributed machine learning systems, offering a practical improvement for large-scale training, though it is incremental by building on gradient coding techniques.

The paper tackles the problem of system delays in distributed gradient descent by introducing ErasureHead, which uses approximate gradient coding to recover inexact gradients with higher delay tolerance, achieving faster overall runtime and significant speedups over existing methods in experiments on real-world datasets and clusters.

We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ErasureHead instead uses approximate gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance. Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and convergence guarantees. We establish that down to a small noise floor, ErasureHead converges as quickly as distributed GD and has faster overall runtime under a probabilistic delay model. We conduct extensive experiments on real world datasets and distributed clusters and demonstrate that our method can lead to significant speedups over both standard and gradient coded GD.

View on arXiv PDF Code

Similar