LGDCMLSep 10, 2019

Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach

arXiv:1909.04532v151 citations
Originality Highly original
AI Analysis

This addresses the critical issue of security in distributed machine learning systems, offering a practical and efficient solution for scenarios with malicious participants.

The paper tackles the problem of Byzantine attacks in distributed stochastic gradient descent by proposing a Lipschitz-inspired coordinate-wise median approach (LICM-SGD), which resists up to half of workers being attackers, converges in non-convex settings, and achieves optimal O(md) time-complexity, outperforming existing methods in experiments on logistic regression and CNNs with MNIST and CIFAR-10 datasets.

In this work, we consider the resilience of distributed algorithms based on stochastic gradient descent (SGD) in distributed learning with potentially Byzantine attackers, who could send arbitrary information to the parameter server to disrupt the training process. Toward this end, we propose a new Lipschitz-inspired coordinate-wise median approach (LICM-SGD) to mitigate Byzantine attacks. We show that our LICM-SGD algorithm can resist up to half of the workers being Byzantine attackers, while still converging almost surely to a stationary region in non-convex settings. Also, our LICM-SGD method does not require any information about the number of attackers and the Lipschitz constant, which makes it attractive for practical implementations. Moreover, our LICM-SGD method enjoys the optimal $O(md)$ computational time-complexity in the sense that the time-complexity is the same as that of the standard SGD under no attacks. We conduct extensive experiments to show that our LICM-SGD algorithm consistently outperforms existing methods in training multi-class logistic regression and convolutional neural networks with MNIST and CIFAR-10 datasets. In our experiments, LICM-SGD also achieves a much faster running time thanks to its low computational time-complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes