Mitigating Data Absence in Federated Learning Using Privacy-Controllable Data Digests
This work addresses performance degradation in federated learning due to data absence, offering a solution for cross-silo applications, but it appears incremental as it builds on existing FL methods with a plugin design.
The paper tackles the problem of data absence and distribution changes in federated learning, particularly in cross-silo scenarios, by introducing the FedDig framework, which uses privacy-controllable data digests to manage these issues and outperforms five baseline algorithms across four public datasets.
The absence of training data and their distribution changes in federated learning (FL) can significantly undermine model performance, especially in cross-silo scenarios. To address this challenge, we introduce the Federated Learning with Data Digest (FedDig) framework. FedDig manages unexpected distribution changes using a novel privacy-controllable data digest representation. This framework allows FL users to adjust the protection levels of the digest by manipulating hyperparameters that control the mixing of multiple low-dimensional features and applying differential privacy perturbation to these mixed features. Evaluation of FedDig across four diverse public datasets shows that it consistently outperforms five baseline algorithms by substantial margins in various data absence scenarios. We also thoroughly explored FedDig's hyperparameters, demonstrating its adaptability. Notably, the FedDig plugin design is inherently extensible and compatible with existing FL algorithms.