LGDCMar 19, 2024

Distributed Learning based on 1-Bit Gradient Coding in the Presence of Stragglers

arXiv:2403.14716v111 citationsIEEE Trans Commun
Originality Incremental advance
AI Analysis

This addresses communication bottlenecks in distributed machine learning systems, particularly for applications with stragglers, but is incremental as it builds on existing gradient coding methods.

The paper tackles the problem of high communication overhead in distributed learning with stragglers by proposing a 1-bit gradient coding method, which reduces communication burden while theoretically guaranteeing convergence for convex and nonconvex loss functions and empirically outperforming baselines under the same overhead.

This paper considers the problem of distributed learning (DL) in the presence of stragglers. For this problem, DL methods based on gradient coding have been widely investigated, which redundantly distribute the training data to the workers to guarantee convergence when some workers are stragglers. However, these methods require the workers to transmit real-valued vectors during the process of learning, which induces very high communication burden. To overcome this drawback, we propose a novel DL method based on 1-bit gradient coding (1-bit GCDL), where 1-bit data encoded from the locally computed gradients are transmitted by the workers to reduce the communication overhead. We theoretically provide the convergence guarantees of the proposed method for both the convex loss functions and nonconvex loss functions. It is shown empirically that 1-bit GC-DL outperforms the baseline methods, which attains better learning performance under the same communication overhead.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes