LGCVSep 21, 2024

Recovering Global Data Distribution Locally in Federated Learning

Peking U
arXiv:2409.14063v1h-index: 7
Originality Highly original
AI Analysis

This addresses data heterogeneity in federated learning, an incremental improvement for privacy-preserving distributed ML.

The paper tackles label imbalance in Federated Learning by proposing ReGL, which uses client-side generative models to synthesize images for minority and missing classes, achieving state-of-the-art performance on image classification datasets.

Federated Learning (FL) is a distributed machine learning paradigm that enables collaboration among multiple clients to train a shared model without sharing raw data. However, a major challenge in FL is the label imbalance, where clients may exclusively possess certain classes while having numerous minority and missing classes. Previous works focus on optimizing local updates or global aggregation but ignore the underlying imbalanced label distribution across clients. In this paper, we propose a novel approach ReGL to address this challenge, whose key idea is to Recover the Global data distribution Locally. Specifically, each client uses generative models to synthesize images that complement the minority and missing classes, thereby alleviating label imbalance. Moreover, we adaptively fine-tune the image generation process using local real data, which makes the synthetic images align more closely with the global distribution. Importantly, both the generation and fine-tuning processes are conducted at the client-side without leaking data privacy. Through comprehensive experiments on various image classification datasets, we demonstrate the remarkable superiority of our approach over existing state-of-the-art works in fundamentally tackling label imbalance in FL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes