LGAIJul 14, 2025

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

arXiv:2507.10348v34 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the challenge of aggregating knowledge from heterogeneous models in federated learning, which is incremental as it improves upon existing distillation techniques.

The paper tackles the problem of unstable and inefficient knowledge aggregation in model-heterogeneous federated learning by proposing FedFD, a feature distillation method that uses orthogonal projection to align features, achieving superior performance compared to state-of-the-art methods.

Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model. However, simply combining Hetero-FL and ensemble distillation does not always yield promising results and can make the training process unstable. The reason is that existing methods primarily focus on logit distillation, which, while being model-agnostic with softmax predictions, fails to compensate for the knowledge bias arising from heterogeneous models. To tackle this challenge, we propose a stable and efficient Feature Distillation for model-heterogeneous Federated learning, dubbed FedFD, that can incorporate aligned feature information via orthogonal projection to integrate knowledge from heterogeneous models better. Specifically, a new feature-based ensemble federated knowledge distillation paradigm is proposed. The global model on the server needs to maintain a projection layer for each client-side model architecture to align the features separately. Orthogonal techniques are employed to re-parameterize the projection layer to mitigate knowledge bias from heterogeneous models and thus maximize the distilled knowledge. Extensive experiments show that FedFD achieves superior performance compared to state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes