LGDCOCMLJun 15, 2020

Distributed Newton Can Communicate Less and Resist Byzantine Workers

arXiv:2006.08737v138 citations
Originality Highly original
AI Analysis

This addresses efficiency and robustness issues in distributed machine learning for scenarios with unreliable workers, representing a novel combination of communication reduction and Byzantine resilience in second-order methods.

The paper tackles the problem of communication overhead and Byzantine failures in distributed second-order optimization by proposing COMRADE, which reduces communication to once per iteration and filters out malicious workers, achieving linear-quadratic convergence with small statistical error rates.

We develop a distributed second order optimization algorithm that is communication-efficient as well as robust against Byzantine failures of the worker machines. We propose COMRADE (COMunication-efficient and Robust Approximate Distributed nEwton), an iterative second order algorithm, where the worker machines communicate only once per iteration with the center machine. This is in sharp contrast with the state-of-the-art distributed second order algorithms like GIANT [34] and DINGO[7], where the worker machines send (functions of) local gradient and Hessian sequentially; thus ending up communicating twice with the center machine per iteration. Moreover, we show that the worker machines can further compress the local information before sending it to the center. In addition, we employ a simple norm based thresholding rule to filter-out the Byzantine worker machines. We establish the linear-quadratic rate of convergence of COMRADE and establish that the communication savings and Byzantine resilience result in only a small statistical error rate for arbitrary convex loss functions. To the best of our knowledge, this is the first work that addresses the issue of Byzantine resilience in second order distributed optimization. Furthermore, we validate our theoretical results with extensive experiments on synthetic and benchmark LIBSVM [5] data-sets and demonstrate convergence guarantees.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes