A Robust Classification Framework for Byzantine-Resilient Stochastic Gradient Descent
This addresses the problem of secure and reliable distributed machine learning for systems vulnerable to adversarial attacks or faults, representing a significant rather than incremental advance in Byzantine resilience.
The paper tackles the problem of Byzantine fault tolerance in distributed stochastic gradient descent by proposing a Robust Gradient Classification Framework (RGCF) that uses a pattern recognition filter to classify gradients as Byzantine based on direction, achieving robustness to an arbitrary number of Byzantine workers in convex and non-convex settings, which improves upon prior work limited to up to 50% Byzantine workers.
This paper proposes a Robust Gradient Classification Framework (RGCF) for Byzantine fault tolerance in distributed stochastic gradient descent. The framework consists of a pattern recognition filter which we train to be able to classify individual gradients as Byzantine by using their direction alone. This filter is robust to an arbitrary number of Byzantine workers for convex as well as non-convex optimisation settings, which is a significant improvement on the prior work that is robust to Byzantine faults only when up to 50% of the workers are Byzantine. This solution does not require an estimate of the number of Byzantine workers; its running time is not dependent on the number of workers and can scale up to training instances with a large number of workers without a loss in performance. We validate our solution by training convolutional neural networks on the MNIST dataset in the presence of Byzantine workers.