LGOCMLOct 12, 2023

Achieving Linear Speedup with ProxSkip in Distributed Stochastic Optimization

arXiv:2310.07983v41 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in distributed optimization for improving efficiency in large-scale machine learning systems, though it is incremental as it builds on existing ProxSkip analysis.

The paper tackled the problem of achieving linear speedup with the ProxSkip algorithm in distributed stochastic optimization, proving that it achieves linear speedup across non-convex, convex, and strongly convex settings and can reduce communication complexity with increased local updates.

The ProxSkip algorithm for distributed optimization is gaining increasing attention due to its effectiveness in reducing communication. However, existing analyses of ProxSkip are limited to the strongly convex setting and fail to achieve linear speedup with respect to the number of nodes. Key questions regarding its behavior in the non-convex setting and the achievability of linear speedup remain open. In this paper, we revisit ProxSkip and address both questions. We provide a comprehensive analysis for stochastic non-convex, convex, and strongly convex problems, revealing the effects of gradient noise, local updates, network connectivity, and data heterogeneity on its convergence. We prove that ProxSkip achieves linear speedup across all three settings, and can further achieve linear speedup with network-independent stepsizes in the strongly convex setting. Moreover, we show that properly increasing local updates effectively reduces communication complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes