LG AISep 24, 2022

Communication-Efficient {Federated} Learning Using Censored Heavy Ball Descent

Yicheng Chen, Rick S. Blum, Brian M. Sadler

arXiv:2209.11944v13.34 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses communication efficiency for distributed learning, particularly in wireless and battery-driven settings, but is incremental as it builds on existing heavy ball methods.

The paper tackles the problem of high communication costs in distributed machine learning by proposing a censoring-based heavy ball (CHB) method that reduces the number of transmissions by having workers self-censor unless gradients change significantly, achieving a linear convergence rate equivalent to classical heavy ball methods and eliminating at least half of communications without slowing optimization.

Distributed machine learning enables scalability and computational offloading, but requires significant levels of communication. Consequently, communication efficiency in distributed learning settings is an important consideration, especially when the communications are wireless and battery-driven devices are employed. In this paper we develop a censoring-based heavy ball (CHB) method for distributed learning in a server-worker architecture. Each worker self-censors unless its local gradient is sufficiently different from the previously transmitted one. The significant practical advantages of the HB method for learning problems are well known, but the question of reducing communications has not been addressed. CHB takes advantage of the HB smoothing to eliminate reporting small changes, and provably achieves a linear convergence rate equivalent to that of the classical HB method for smooth and strongly convex objective functions. The convergence guarantee of CHB is theoretically justified for both convex and nonconvex cases. In addition we prove that, under some conditions, at least half of all communications can be eliminated without any impact on convergence rate. Extensive numerical results validate the communication efficiency of CHB on both synthetic and real datasets, for convex, nonconvex, and nondifferentiable cases. Given a target accuracy, CHB can significantly reduce the number of communications compared to existing algorithms, achieving the same accuracy without slowing down the optimization process.

View on arXiv PDF

Similar