Accelerating Decentralized Optimization via Overlapping Local Steps
This work addresses communication delays in decentralized learning for scalable and privacy-preserving distributed training, representing an incremental improvement over existing methods.
The paper tackles the communication bottleneck in decentralized optimization by proposing Overlapping Local Decentralized SGD (OLDSGD), which overlaps computation and communication to reduce network idle time, resulting in improved wall-clock convergence without sacrificing theoretical guarantees.
Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, existing methods often suffer from communication bottlenecks due to frequent synchronization between nodes. We present Overlapping Local Decentralized SGD (OLDSGD), a novel approach to accelerate decentralized training by computation-communication overlapping, significantly reducing network idle time. With a deliberately designed update, OLDSGD preserves the same average update as Local SGD while avoiding communication-induced stalls. Theoretically, we establish non-asymptotic convergence rates for smooth non-convex objectives, showing that OLDSGD retains the same iteration complexity as standard Local Decentralized SGD while improving per-iteration runtime. Empirical results demonstrate OLDSGD's consistent improvements in wall-clock time convergence under different levels of communication delays. With minimal modifications to existing frameworks, OLDSGD offers a practical solution for faster decentralized learning without sacrificing theoretical guarantees.