Boosting Generalization Performance in Model-Heterogeneous Federated Learning Using Variational Transposed Convolution
This addresses the challenge of data and model heterogeneity in federated learning, improving generalization for distributed clients with varying architectures, though it is incremental as it builds on existing FL paradigms.
The paper tackles the problem of low generalization performance in model-heterogeneous federated learning by proposing a framework where clients exchange feature distributions instead of model parameters and use a variational transposed convolutional network to generate synthetic data for fine-tuning, resulting in higher generalization accuracy, lower communication costs, and reduced memory consumption compared to existing methods.
Federated learning (FL) is a pioneering machine learning paradigm that enables distributed clients to process local data effectively while ensuring data privacy. However, the efficacy of FL is usually impeded by the data heterogeneity among clients, resulting in local models with low generalization performance. To address this problem, traditional model-homogeneous approaches mainly involve debiasing the local training procedures with regularization or dynamically adjusting client weights in aggregation. Nonetheless, these approaches become incompatible for scenarios where clients exhibit heterogeneous model architectures. In this paper, we propose a model-heterogeneous FL framework that can improve clients' generalization performance over unseen data without model aggregation. Instead of model parameters, clients exchange the feature distributions with the server, including the mean and the covariance. Accordingly, clients train a variational transposed convolutional (VTC) neural network with Gaussian latent variables sampled from the feature distributions, and use the VTC model to generate synthetic data. By fine-tuning local models with the synthetic data, clients significantly increase their generalization performance. Experimental results show that our approach obtains higher generalization accuracy than existing model-heterogeneous FL frameworks, as well as lower communication costs and memory consumption