LGMLFeb 4, 2022

SignSGD: Fault-Tolerance to Blind and Byzantine Adversaries

arXiv:2202.02085v22 citationsHas Code
Originality Highly original
AI Analysis

This addresses fault-tolerance in distributed machine learning for applications requiring reliable training in adversarial environments, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the problem of distributed learning being vulnerable to faulty or malicious devices (Byzantine adversaries) that prevent convergence, and shows that the SignSGD algorithm is robust to such adversaries with a proven convergence rate upper bound and empirical validation.

Distributed learning has become a necessity for training ever-growing models by sharing calculation among several devices. However, some of the devices can be faulty, deliberately or not, preventing the proper convergence. As a matter of fact, the baseline distributed SGD algorithm does not converge in the presence of one Byzantine adversary. In this article we focus on the more robust SignSGD algorithm derived from SGD. We provide an upper bound for the convergence rate of SignSGD proving that this new version is robust to Byzantine adversaries. We implemented SignSGD along with Byzantine strategies attempting to crush the learning process. Therefore, we provide empirical observations from our experiments to support our theory. Our code is available on GitHub https://github.com/jasonakoun/signsgd-fault-tolerance and our experiments are reproducible by using the provided parameters.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes