LGCRDCFeb 28, 2020

Distributed Momentum for Byzantine-resilient Learning

arXiv:2003.00010v222 citations
AI Analysis

This work addresses robustness in distributed machine learning against Byzantine failures, presenting an incremental improvement over existing methods.

The paper tackles the problem of Byzantine-resilient distributed learning by showing that using momentum at the worker side reduces the variance-norm ratio of gradient estimation, strengthening robustness. Experimental results demonstrate improved robustness in distributed SGD.

Momentum is a variant of gradient descent that has been proposed for its benefits on convergence. In a distributed setting, momentum can be implemented either at the server or the worker side. When the aggregation rule used by the server is linear, commutativity with addition makes both deployments equivalent. Robustness and privacy are however among motivations to abandon linear aggregation rules. In this work, we demonstrate the benefits on robustness of using momentum at the worker side. We first prove that computing momentum at the workers reduces the variance-norm ratio of the gradient estimation at the server, strengthening Byzantine resilient aggregation rules. We then provide an extensive experimental demonstration of the robustness effect of worker-side momentum on distributed SGD.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes