DCMLMay 16, 2021

LocalNewton: Reducing Communication Bottleneck for Distributed Learning

arXiv:2105.07320v115 citations
Originality Highly original
AI Analysis

This addresses the communication overhead problem in distributed machine learning systems, offering a practical solution for large-scale training with master-worker frameworks.

The paper tackles the communication bottleneck in distributed optimization by proposing LocalNewton, a second-order algorithm with local averaging, which reduces communication rounds by over 60% and end-to-end running time by over 40% compared to state-of-the-art methods to achieve the same training loss.

To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the workers run multiple such iterations locally and communicate the models to the master node only once every few (say L) iterations. LocalNewton is highly practical since it requires only one hyperparameter, the number L of local iterations. We use novel matrix concentration-based techniques to obtain theoretical guarantees for LocalNewton, and we validate them with detailed empirical evaluation. To enhance practicability, we devise an adaptive scheme to choose L, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master. Via extensive experiments using several real-world datasets with AWS Lambda workers and an AWS EC2 master, we show that LocalNewton requires fewer than 60% of the communication rounds (between master and workers) and less than 40% of the end-to-end running time, compared to state-of-the-art algorithms, to reach the same training~loss.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes