OCLGMar 3

Convex and Non-convex Federated Learning with Stale Stochastic Gradients: Diminishing Step Size is All You Need

arXiv:2603.02639v1h-index: 23
Originality Incremental advance
AI Analysis

This work addresses efficient federated learning under realistic communication delays, offering a simpler and practical solution for distributed systems.

The paper tackles the problem of distributed stochastic optimization with delayed and potentially biased gradient estimates, showing that a pre-chosen diminishing step size suffices to achieve optimal convergence rates for nonconvex and strongly convex objectives, matching prior adaptive schemes.

We propose a general framework for distributed stochastic optimization under delayed gradient models. In this setting, $n$ local agents leverage their own data and computation to assist a central server in minimizing a global objective composed of agents' local cost functions. Each agent is allowed to transmit stochastic-potentially biased and delayed-estimates of its local gradient. While a prior work has advocated delay-adaptive step sizes for stochastic gradient descent (SGD) in the presence of delays, we demonstrate that a pre-chosen diminishing step size is sufficient and matches the performance of the adaptive scheme. Moreover, our analysis establishes that diminishing step sizes recover the optimal SGD rates for nonconvex and strongly convex objectives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes