LGOct 4, 2023

Heterogeneous Federated Learning Using Knowledge Codistillation

Jared Lichtarge, Ehsan Amid, Shankar Kumar, Tien-Ju Yang, Rohan Anil, Rajiv Mathews

DeepMind

arXiv:2310.02549v12.0h-index: 21

Originality Incremental advance

AI Analysis

This addresses the problem of inefficient model capacity utilization in federated learning for applications like image and language processing, though it is incremental as it builds on existing knowledge distillation techniques.

The paper tackled the limitation of federated learning algorithms requiring identical model architectures across clients, which restricts performance, by proposing a method using bidirectional knowledge distillation between small and large models on a server with unlabeled data, resulting in improvements over federated averaging on image classification and language modeling tasks.

Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture. This results in unused modeling capacity on many clients, which limits model performance. To address this issue, we propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with higher capacity. The models exchange information bidirectionally via knowledge distillation, utilizing an unlabeled dataset on a server without sharing parameters. We present two variants of our method, which improve upon federated averaging on image classification and language modeling tasks. We show this technique can be useful even if only out-of-domain or limited in-domain distillation data is available. Additionally, the bi-directional knowledge distillation allows for domain transfer between the models when different pool populations introduce domain shift.

View on arXiv PDF

Similar