DCITLGNov 6, 2018

Erasure coding for distributed matrix multiplication for matrices with bounded entries

arXiv:1811.02144v220 citations
Originality Incremental advance
AI Analysis

This work addresses straggler issues in distributed computing for scientific domains, offering incremental improvements over prior coding methods.

The paper tackles the problem of straggler mitigation in distributed matrix multiplication by introducing a novel erasure coding strategy for matrices with bounded entries, demonstrating a tradeoff between entry bounds and recovery threshold and validating benefits through cloud experiments.

Distributed matrix multiplication is widely used in several scientific domains. It is well recognized that computation times on distributed clusters are often dominated by the slowest workers (called stragglers). Recent work has demonstrated that straggler mitigation can be viewed as a problem of designing erasure codes. For matrices $\mathbf A$ and $\mathbf B$, the technique essentially maps the computation of $\mathbf A^T \mathbf B$ into the multiplication of smaller (coded) submatrices. The stragglers are treated as erasures in this process. The computation can be completed as long as a certain number of workers (called the recovery threshold) complete their assigned tasks. We present a novel coding strategy for this problem when the absolute values of the matrix entries are sufficiently small. We demonstrate a tradeoff between the assumed absolute value bounds on the matrix entries and the recovery threshold. At one extreme, we are optimal with respect to the recovery threshold and on the other extreme, we match the threshold of prior work. Experimental results on cloud-based clusters validate the benefits of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes