LGAIMLDec 1, 2020

Communication-Efficient Federated Distillation

arXiv:2012.00632v142 citations
AI Analysis

This work significantly improves communication efficiency for Federated Learning, benefiting applications with large models or limited bandwidth.

This paper addresses communication constraints in Federated Learning by proposing Compressed Federated Distillation (CFD). CFD reduces communication by over two orders of magnitude compared to standard Federated Distillation (FD) and over four orders of magnitude compared to Federated Averaging (FA) for fixed performance targets in image classification and language modeling.

Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning with fundamentally different communication properties, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set, between the central server and the participating clients. While for conventional Federated Learning algorithms, like Federated Averaging (FA), communication scales with the size of the jointly trained model, in FD communication scales with the distillation data set size, resulting in advantageous communication properties, especially when large models are trained. In this work, we investigate FD from the perspective of communication efficiency by analyzing the effects of active distillation-data curation, soft-label quantization and delta-coding techniques. Based on the insights gathered from this analysis, we present Compressed Federated Distillation (CFD), an efficient Federated Distillation method. Extensive experiments on Federated image classification and language modeling problems demonstrate that our method can reduce the amount of communication necessary to achieve fixed performance targets by more than two orders of magnitude, when compared to FD and by more than four orders of magnitude when compared with FA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes