Local Updates in Distributed Optimization: Provable Acceleration and Topology Effects

arXiv:2601.0344284.3h-index: 1

AI Analysis

Provides the first rigorous proof of acceleration via local updates in distributed optimization with exact gradients, offering practical guidance for efficient implementation.

This paper proves that incorporating local updates in the DIGing algorithm can accelerate distributed optimization, with two local updates achieving maximal improvement and further updates providing no gain. The acceleration depends on network topology, with sparser graphs yielding smaller improvements.

Inspired by the success of performing multiple local optimization steps between communication rounds in federated learning, incorporating such local updates into distributed optimization has recently attracted growing interest. However, unlike federated learning, where local updates can accelerate training by reducing gradient estimation error under minibatch settings, it remains unclear whether similar benefits persist when exact gradients are available. Moreover, existing theoretical results typically require reducing the step size when multiple local updates are employed, which can entirely offset any potential benefit of these additional local updates. In this paper, we focus on the classic DIGing algorithm and leverage the tight performance bounds provided by Performance Estimation Problems (PEP) to show that incorporating local updates can indeed accelerate distributed optimization. To the best of our knowledge, this is the first rigorous demonstration of such acceleration for a broad class of objective functions. Our analysis further reveals that, under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and that additional local updates provide no further gains. Because more updates increase computational cost, these findings offer practical guidance for efficient implementation. We also show that these speed gains depend critically on the network structure, with sparser or less connected graphs, characterized by the spectral properties of the mixing matrix, yielding smaller improvements. Extensive experiments on both synthetic and real-world datasets corroborate the theoretical findings.

View on arXiv PDF

Similar