LGOCMar 23, 2024

The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity

arXiv:2403.15654v38 citationsh-index: 2IEEE Transactions on Signal Processing
Originality Incremental advance
AI Analysis

This work addresses communication bottlenecks in decentralized optimization for distributed machine learning systems, offering incremental improvements to existing methods.

The paper tackles the problem of communication efficiency in decentralized learning under data heterogeneity by analyzing Decentralized Gradient Tracking and Decentralized Gradient Descent with multiple local updates, showing that local updates can reduce communication complexity, with specific bounds provided for strongly convex and smooth functions and exact linear convergence under over-parameterization.

We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating local update steps can reduce communication complexity. Specifically, for $μ$-strongly convex and $L$-smooth loss functions, we proved that local DGT achieves communication complexity {}{$\tilde{\mathcal{O}} \Big(\frac{L}{μ(K+1)} + \frac{δ+ {}μ}{μ(1 - ρ)} + \frac{ρ}{(1 - ρ)^2} \cdot \frac{L+ δ}μ\Big)$}, %\zhize{seems to be $\tilde{\mathcal{O}}$} {where $K$ is the number of additional local update}, $ρ$ measures the network connectivity and $δ$ measures the second-order heterogeneity of the local losses. Our results reveal the tradeoff between communication and computation and show increasing $K$ can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected. We then consider the over-parameterization regime where the local losses share the same minimums. We proved that employing local updates in DGD, even without gradient correction, achieves exact linear convergence under the Polyak-Łojasiewicz (PL) condition, which can yield a similar effect as DGT in reducing communication complexity. {}{Customization of the result to linear models is further provided, with improved rate expression. }Numerical experiments validate our theoretical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes