LGDCMLAug 11, 2020

Byzantine Fault-Tolerant Distributed Machine Learning Using Stochastic Gradient Descent (SGD) and Norm-Based Comparative Gradient Elimination (CGE)

arXiv:2008.04699v211 citations
Originality Incremental advance
AI Analysis

This addresses the issue of faulty agents disrupting distributed machine learning for researchers and practitioners, but it is incremental as it builds on existing gradient-filter approaches.

The paper tackles the problem of Byzantine fault-tolerance in distributed stochastic gradient descent (D-SGD) by proposing a norm-based gradient-filter called comparative gradient elimination (CGE), which guarantees fault-tolerance against a bounded fraction of Byzantine agents and is computationally simpler than existing methods, with empirical simulations showing comparable fault-tolerance on neural networks.

This paper considers the Byzantine fault-tolerance problem in distributed stochastic gradient descent (D-SGD) method - a popular algorithm for distributed multi-agent machine learning. In this problem, each agent samples data points independently from a certain data-generating distribution. In the fault-free case, the D-SGD method allows all the agents to learn a mathematical model best fitting the data collectively sampled by all agents. We consider the case when a fraction of agents may be Byzantine faulty. Such faulty agents may not follow a prescribed algorithm correctly, and may render traditional D-SGD method ineffective by sharing arbitrary incorrect stochastic gradients. We propose a norm-based gradient-filter, named comparative gradient elimination (CGE), that robustifies the D-SGD method against Byzantine agents. We show that the CGE gradient-filter guarantees fault-tolerance against a bounded fraction of Byzantine agents under standard stochastic assumptions, and is computationally simpler compared to many existing gradient-filters such as multi-KRUM, geometric median-of-means, and the spectral filters. We empirically show, by simulating distributed learning on neural networks, that the fault-tolerance of CGE is comparable to that of existing gradient-filters. We also empirically show that exponential averaging of stochastic gradients improves the fault-tolerance of a generic gradient-filter.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes