IT DC LG SPMar 5, 2019

Gradient Coding with Clustering and Multi-message Communication

Emre Ozfatura, Deniz Gunduz, Sennur Ulukus

arXiv:1903.01974v111.341 citations

Originality Incremental advance

AI Analysis

This addresses the issue of wasted computational capacity in distributed machine learning for large datasets, though it appears incremental as it builds on existing gradient coding methods.

The paper tackles the problem of straggling workers slowing down distributed gradient descent by proposing a gradient coding scheme with multi-message communication and clustering, which numerically reduces the average completion time per iteration with minimal communication overhead.

Gradient descent (GD) methods are commonly employed in machine learning problems to optimize the parameters of the model in an iterative fashion. For problems with massive datasets, computations are distributed to many parallel computing servers (i.e., workers) to speed up GD iterations. While distributed computing can increase the computation speed significantly, the per-iteration completion time is limited by the slowest straggling workers. Coded distributed computing can mitigate straggling workers by introducing redundant computations; however, existing coded computing schemes are mainly designed against persistent stragglers, and partial computations at straggling workers are discarded, leading to wasted computational capacity. In this paper, we propose a novel gradient coding (GC) scheme which allows multiple coded computations to be conveyed from each worker to the master per iteration. We numerically show that the proposed GC with multi-message communication (MMC) together with clustering provides significant improvements in the average completion time (of each iteration), with minimal or no increase in the communication load.

View on arXiv PDF

Similar