OC LGMar 3

Convex and Non-convex Federated Learning with Stale Stochastic Gradients: Diminishing Step Size is All You Need

Xinran Zheng, Tara Javidi, Behrouz Touri

arXiv:2603.02639v12.5h-index: 19

Originality Incremental advance

AI Analysis

This work addresses efficient federated learning under realistic communication delays, offering a simpler and practical solution for distributed systems.

The paper tackles the problem of distributed stochastic optimization with delayed and potentially biased gradient estimates, showing that a pre-chosen diminishing step size suffices to achieve optimal convergence rates for nonconvex and strongly convex objectives, matching prior adaptive schemes.

We propose a general framework for distributed stochastic optimization under delayed gradient models. In this setting, $n$ local agents leverage their own data and computation to assist a central server in minimizing a global objective composed of agents' local cost functions. Each agent is allowed to transmit stochastic-potentially biased and delayed-estimates of its local gradient. While a prior work has advocated delay-adaptive step sizes for stochastic gradient descent (SGD) in the presence of delays, we demonstrate that a pre-chosen diminishing step size is sufficient and matches the performance of the adaptive scheme. Moreover, our analysis establishes that diminishing step sizes recover the optimal SGD rates for nonconvex and strongly convex objectives.

View on arXiv PDF

Similar