FedH2L: Federated Learning with Model and Statistical Heterogeneity
This addresses the challenge of real-world federated learning where participants have different network architectures and non-uniform data distributions, though it appears incremental as it builds on existing distillation methods.
The paper tackles the problem of federated learning with model and statistical heterogeneity by introducing FedH2L, which uses mutual distillation on a shared seed set to achieve bandwidth efficiency and model agnosticism, resulting in models that perform well on heterogeneous data distributions.
Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy. Mainstream FL approaches require each participant to share a common network architecture and further assume that data are are sampled IID across participants. However, in real-world deployments participants may require heterogeneous network architectures; and the data distribution is almost certainly non-uniform across participants. To address these issues we introduce FedH2L, which is agnostic to both the model architecture and robust to different data distributions across participants. In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner. This makes it extremely bandwidth efficient, model agnostic, and crucially produces models capable of performing well on the whole data distribution when learning from heterogeneous silos.