LGAug 20, 2023

Adaptive pruning-based Newton's method for distributed learning

MIT
arXiv:2308.10154v41 citationsh-index: 32
Originality Highly original
AI Analysis

This addresses scalability and efficiency issues in distributed learning for applications like large-scale machine learning, though it is an incremental improvement over existing Newton-based methods.

The paper tackles the impracticality of Newton's method in large-scale, heterogeneous distributed learning by proposing DANL, which uses Hessian initialization and adaptive training allocation to achieve linear convergence with efficient communication and strong performance across datasets.

Newton's method leverages curvature information to boost performance, and thus outperforms first-order methods for distributed learning problems. However, Newton's method is not practical in large-scale and heterogeneous learning environments, due to obstacles such as high computation and communication costs of the Hessian matrix, sub-model diversity, staleness of training, and data heterogeneity. To overcome these obstacles, this paper presents a novel and efficient algorithm named Distributed Adaptive Newton Learning (\texttt{DANL}), which solves the drawbacks of Newton's method by using a simple Hessian initialization and adaptive allocation of training regions. The algorithm exhibits remarkable convergence properties, which are rigorously examined under standard assumptions in stochastic optimization. The theoretical analysis proves that \texttt{DANL} attains a linear convergence rate while efficiently adapting to available resources and keeping high efficiency. Furthermore, \texttt{DANL} shows notable independence from the condition number of the problem and removes the necessity for complex parameter tuning. Experiments demonstrate that \texttt{DANL} achieves linear convergence with efficient communication and strong performance across different datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes