Federated Learning in Non-IID Settings Aided by Differentially Private Synthetic Data
This addresses the problem of data heterogeneity for federated learning systems, offering an incremental improvement over existing methods.
The paper tackles performance degradation in federated learning due to non-IID data by proposing FedDPMS, which uses differentially private synthetic data to augment local datasets, and it outperforms state-of-the-art methods in deep image classification tasks.
Federated learning (FL) is a privacy-promoting framework that enables potentially large number of clients to collaboratively train machine learning models. In a FL system, a server coordinates the collaboration by collecting and aggregating clients' model updates while the clients' data remains local and private. A major challenge in federated learning arises when the local data is heterogeneous -- the setting in which performance of the learned global model may deteriorate significantly compared to the scenario where the data is identically distributed across the clients. In this paper we propose FedDPMS (Federated Differentially Private Means Sharing), an FL algorithm in which clients deploy variational auto-encoders to augment local datasets with data synthesized using differentially private means of latent data representations communicated by a trusted server. Such augmentation ameliorates effects of data heterogeneity across the clients without compromising privacy. Our experiments on deep image classification tasks demonstrate that FedDPMS outperforms competing state-of-the-art FL methods specifically designed for heterogeneous data settings.