LG DC MLMar 17, 2019

Zeno++: Robust Fully Asynchronous SGD

arXiv:1903.07020v520.8138 citationsHas Code

Originality Highly original

AI Analysis

This addresses the challenge of secure and efficient distributed machine learning in unreliable environments, offering a significant improvement over prior methods by removing unrealistic communication restrictions.

The paper tackles the problem of Byzantine failures in asynchronous distributed optimization by proposing Zeno++, a robust SGD method that allows fully asynchronous updates from anonymous workers with arbitrary staleness and an unbounded number of Byzantine workers, achieving convergence for non-convex problems and outperforming existing approaches in experiments.

We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent~(SGD) procedure which tolerates Byzantine failures of the workers. In contrast to previous work, Zeno++ removes some unrealistic restrictions on worker-server communications, allowing for fully asynchronous updates from anonymous workers, arbitrarily stale worker updates, and the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing approaches.

View on arXiv PDF Code

Similar