DC LG OC MLMar 17, 2021

Escaping Saddle Points in Distributed Newton's Method with Communication Efficiency and Byzantine Resilience

Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran

arXiv:2103.09424v23.34 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient and resilient optimization in federated learning for applications like distributed machine learning, though it is an incremental extension of an existing method to a new setting.

The authors tackled the problem of saddle-point avoidance in non-convex distributed optimization, particularly under Byzantine attacks, by extending the cubic-regularized Newton method to a distributed framework with communication compression, resulting in a 25% improvement in iteration complexity compared to first-order methods.

The problem of saddle-point avoidance for non-convex optimization is quite challenging in large scale distributed learning frameworks, such as Federated Learning, especially in the presence of Byzantine workers. The celebrated cubic-regularized Newton method of \cite{nest} is one of the most elegant ways to avoid saddle-points in the standard centralized (non-distributed) setup. In this paper, we extend the cubic-regularized Newton method to a distributed framework and simultaneously address several practical challenges like communication bottleneck and Byzantine attacks. Note that the issue of saddle-point avoidance becomes more crucial in the presence of Byzantine machines since rogue machines may create \emph{fake local minima} near the saddle-points of the loss function, also known as the saddle-point attack. Being a second order algorithm, our iteration complexity is much lower than the first order counterparts. Furthermore we use compression (or sparsification) techniques like $δ$-approximate compression for communication efficiency. We obtain theoretical guarantees for our proposed scheme under several settings including approximate (sub-sampled) gradients and Hessians. Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks, and obtain an improvement of $25\%$ with respect to first order methods in iteration complexity.

View on arXiv PDF

Similar