Soft BIBD and Product Gradient Codes
This work addresses the problem of straggler robustness in distributed machine learning for practitioners, offering more practical gradient codes, though it is incremental as it builds on existing BIBD-based methods.
The paper tackles the limited availability of balanced incomplete block design (BIBD) gradient codes for distributed machine learning by proposing two new constructions that relax constraints and use Kronecker products, enabling flexible system parameters while maintaining comparable error performance.
Gradient coding is a coding theoretic framework to provide robustness against slow or unresponsive machines, known as stragglers, in distributed machine learning applications. Recently, Kadhe et al. proposed a gradient code based on a combinatorial design, called balanced incomplete block design (BIBD), which is shown to outperform many existing gradient codes in worst-case adversarial straggling scenarios. However, parameters for which such BIBD constructions exist are very limited. In this paper, we aim to overcome such limitations and construct gradient codes which exist for a wide range of system parameters while retaining the superior performance of BIBD gradient codes. Two such constructions are proposed, one based on a probabilistic construction that relax the stringent BIBD gradient code constraints, and the other based on taking the Kronecker product of existing gradient codes. The proposed gradient codes allow flexible choices of system parameters while retaining comparable error performance.