LGOCMLFeb 19, 2020

Personalized Federated Learning: A Meta-Learning Approach

arXiv:2002.07948v4705 citations
AI Analysis

This addresses the need for personalized models in federated learning for users with heterogeneous data distributions, representing an incremental advancement by combining existing methods.

The paper tackles the problem of model personalization in federated learning by proposing a meta-learning approach that adapts a shared initial model to individual users with one or few gradient steps, showing performance improvements in terms of gradient norm for non-convex losses and analyzing the impact of data distribution distances.

In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common output for all the users, and, therefore, it does not adapt the model to each user. This is an important missing feature, especially given the heterogeneity of the underlying data distribution for various users. In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data. This approach keeps all the benefits of the federated learning architecture, and, by structure, leads to a more personalized model for each user. We show this problem can be studied within the Model-Agnostic Meta-Learning (MAML) framework. Inspired by this connection, we study a personalized variant of the well-known Federated Averaging algorithm and evaluate its performance in terms of gradient norm for non-convex loss functions. Further, we characterize how this performance is affected by the closeness of underlying distributions of user data, measured in terms of distribution distances such as Total Variation and 1-Wasserstein metric.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes