Adaptive Parameterization of Deep Learning Models for Federated Learning
This addresses communication bottlenecks for distributed deep learning systems, though it appears incremental as it adapts existing Adapters to Federated Learning.
The paper tackled the communication overhead problem in Federated Learning by proposing the use of parallel Adapters, achieving similar inference performance while reducing communication overhead by roughly 90%.
Federated Learning offers a way to train deep neural networks in a distributed fashion. While this addresses limitations related to distributed data, it incurs a communication overhead as the model parameters or gradients need to be exchanged regularly during training. This can be an issue with large scale distribution of learning tasks and negate the benefit of the respective resource distribution. In this paper, we we propose to utilise parallel Adapters for Federated Learning. Using various datasets, we show that Adapters can be incorporated to different Federated Learning techniques. We highlight that our approach can achieve similar inference performance compared to training the full model while reducing the communication overhead by roughly 90%. We further explore the applicability of Adapters in cross-silo and cross-device settings, as well as different non-IID data distributions.