Heekang Song

h-index2
2papers

2 Papers

7.2SYMay 15
Communication-Efficient Approximate Gradient Coding for Distributed Learning in Heterogeneous Systems

Heekang Song, Wan Choi

We propose a communication-efficient optimally structured gradient coding scheme to jointly address straggler resilience and communication efficiency in heterogeneous distributed learning. By establishing a unified framework that simultaneously optimizes gradient coding and quantization, we formulate an optimization problem to minimize residual error subject to an unbiasedness constraint. We rigorously establish the joint global optimum by deriving a closed-form code structure coupled with an optimal bit allocation strategy, while simultaneously proposing a low-complexity bit allocation algorithm that efficiently yields near-optimal performance. We provide rigorous convergence analysis for convex and smooth functions. Experiments on the COCO dataset demonstrate that our joint design significantly accelerates convergence and enhances communication efficiency compared to existing baselines.

SYOct 26, 2025
Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers

Heekang Song, Wan Choi

In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data replication, limiting performance in real-world heterogeneous systems. To address these limitations, we formulate an optimization problem minimizing residual error while ensuring unbiased gradient estimation by explicitly considering individual straggler probabilities. We derive closed-form solutions for optimal encoding and decoding coefficients via Lagrangian duality and convex optimization, and propose data allocation strategies that reduce both redundancy and computation load. We also analyze convergence behavior for $λ$-strongly convex and $μ$-smooth loss functions. Numerical results show that our approach significantly reduces the impact of stragglers and accelerates convergence compared to existing methods.