Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays
This addresses scalability and bias issues in federated learning for applications like healthcare or finance where data privacy and heterogeneous client speeds are critical, representing a novel advancement rather than an incremental improvement.
The paper tackles the problem of slow training and bias in asynchronous federated learning due to heterogeneous client delays and non-IID data by proposing AREA, a method that uses client-side memory to correct bias without sampling, achieving optimal convergence rates of O(1/K) for strongly convex, smooth objectives and O(1/√K) for convex, nonsmooth ones.
Federated learning (FL) was recently proposed to securely train models with data held over multiple locations (``clients'') under the coordination of a central server. Prolonged training times caused by slow clients may hinder the performance of FL; while asynchronous communication is a promising solution, highly heterogeneous client response times under non-IID local data may introduce significant bias to the global model, particularly in client-driven setups where sampling is infeasible. To address this issue, we propose \underline{A}synch\underline{R}onous \underline{E}xact \underline{A}veraging (\textsc{AREA}), a stochastic (sub)gradient method that leverages asynchrony for scalability and uses client-side memory to correct the bias induced by uneven participation, without client sampling or prior knowledge of client latencies. \textsc{AREA} communicates model residuals rather than gradient estimates, reducing exposure to gradient inversion, and is compatible with secure aggregation. Under standard assumptions and unbounded, heterogeneous delays with finite mean, AREA achieves optimal convergence rates: $\mathcal{O}(1/K)$ in the strongly convex, smooth regime and $\mathcal{O}(1/\sqrt{K})$ in the convex, nonsmooth regime. For strongly convex, smooth objectives, we demonstrate theoretically and empirically that AREA accommodates larger step sizes than existing methods, enabling fast convergence without adversely impacting model generalization. In the convex, nonsmooth setting, to our knowledge we are the first to obtain rates that scale with the average client update frequency rather than the minimum or maximum, indicating increased robustness to outliers.