SP LGDec 31, 2020

Bayesian Federated Learning over Wireless Networks

Seunghoon Lee, Chanho Park, Song-Nam Hong, Yonina C. Eldar, Namyoon Lee

arXiv:2012.15486v18.624 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently and accurately aggregating quantized gradients in federated learning over wireless networks, which is a problem for researchers and practitioners deploying FL in resource-constrained environments.

This paper proposes a Bayesian federated learning (BFL) algorithm to optimally aggregate heterogeneous quantized gradient information in wireless networks, minimizing mean-squared error (MSE). To address the high communication and computational costs of BFL, a scalable version called SBFL is introduced, which significantly outperforms conventional sign stochastic gradient descent when training neural networks on MNIST datasets over heterogeneous wireless networks.

Federated learning is a privacy-preserving and distributed training method using heterogeneous data sets stored at local devices. Federated learning over wireless networks requires aggregating locally computed gradients at a server where the mobile devices send statistically distinct gradient information over heterogenous communication links. This paper proposes a Bayesian federated learning (BFL) algorithm to aggregate the heterogeneous quantized gradient information optimally in the sense of minimizing the mean-squared error (MSE). The idea of BFL is to aggregate the one-bit quantized local gradients at the server by jointly exploiting i) the prior distributions of the local gradients, ii) the gradient quantizer function, and iii) channel distributions. Implementing BFL requires high communication and computational costs as the number of mobile devices increases. To address this challenge, we also present an efficient modified BFL algorithm called scalable-BFL (SBFL). In SBFL, we assume a simplified distribution on the local gradient. Each mobile device sends its one-bit quantized local gradient together with two scalar parameters representing this distribution. The server then aggregates the noisy and faded quantized gradients to minimize the MSE. We provide a convergence analysis of SBFL for a class of non-convex loss functions. Our analysis elucidates how the parameters of communication channels and the gradient priors affect convergence. From simulations, we demonstrate that SBFL considerably outperforms the conventional sign stochastic gradient descent algorithm when training and testing neural networks using MNIST data sets over heterogeneous wireless networks.

View on arXiv PDF

Similar