DCLGNEOCMLMar 8, 2017

Byzantine-Tolerant Machine Learning

arXiv:1703.02757v184 citations
Originality Highly original
AI Analysis

This addresses a critical robustness issue for distributed machine learning systems, enabling reliable training in the presence of malicious or faulty workers.

The paper tackles the problem of arbitrary (Byzantine) failures in distributed machine learning, showing that existing linear gradient update rules cannot tolerate even a single failure, and proposes Krum, a new update rule that guarantees convergence despite up to f Byzantine workers with time complexity O(n^2 * (d + log n)).

The growth of data, the need for scalability and the complexity of models used in modern machine learning calls for distributed implementations. Yet, as of today, distributed machine learning frameworks have largely ignored the possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study the robustness to Byzantine failures at the fundamental level of stochastic gradient descent (SGD), the heart of most machine learning algorithms. Assuming a set of $n$ workers, up to $f$ of them being Byzantine, we ask how robust can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient descent update rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the update rule capturing the basic requirements to guarantee convergence despite $f$ Byzantine workers. We finally propose Krum, an update rule that satisfies the resilience property aforementioned. For a $d$-dimensional learning problem, the time complexity of Krum is $O(n^2 \cdot (d + \log n))$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes