LGDCMLMar 18, 2021

Semi-Decentralized Federated Learning with Cooperative D2D Local Model Aggregations

arXiv:2103.10481v3140 citations
AI Analysis

This addresses efficiency and robustness challenges in federated learning for edge computing, particularly in scenarios with statistical heterogeneity, though it appears incremental as it builds on existing federated learning paradigms.

The paper tackles the problem of federated learning at the wireless edge by proposing a semi-decentralized architecture that combines device-to-server and device-to-device communications, resulting in improved model accuracy and reduced network energy consumption compared to existing methods, with experiments showing robustness against channel outages and non-convex losses.

Federated learning has emerged as a popular technique for distributing machine learning (ML) model training across the wireless edge. In this paper, we propose two timescale hybrid federated learning (TT-HF), a semi-decentralized learning architecture that combines the conventional device-to-server communication paradigm for federated learning with device-to-device (D2D) communications for model training. In TT-HF, during each global aggregation interval, devices (i) perform multiple stochastic gradient descent iterations on their individual datasets, and (ii) aperiodically engage in consensus procedure of their model parameters through cooperative, distributed D2D communications within local clusters. With a new general definition of gradient diversity, we formally study the convergence behavior of TT-HF, resulting in new convergence bounds for distributed ML. We leverage our convergence bounds to develop an adaptive control algorithm that tunes the step size, D2D communication rounds, and global aggregation period of TT-HF over time to target a sublinear convergence rate of O(1/t) while minimizing network resource utilization. Our subsequent experiments demonstrate that TT-HF significantly outperforms the current art in federated learning in terms of model accuracy and/or network energy consumption in different scenarios where local device datasets exhibit statistical heterogeneity. Finally, our numerical evaluations demonstrate robustness against outages caused by fading channels, as well favorable performance with non-convex loss functions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes