LGCVJun 10, 2021

Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning

arXiv:2106.06047v2227 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of training models across heterogeneous devices in federated learning, offering an architectural alternative to optimization-focused methods.

The paper tackles the problem of data heterogeneity in federated learning by showing that self-attention-based architectures like Transformers improve robustness to distribution shifts, reducing catastrophic forgetting and accelerating convergence to achieve a better global model.

Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models at https://github.com/Liangqiong/ViT-FL-main to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes