DCLGOct 16, 2024

Disentangling data distribution for Federated Learning

arXiv:2410.12530v21 citationsh-index: 18
Originality Highly original
AI Analysis

This addresses efficiency and utility issues in federated learning for distributed clients with private data, representing a novel method rather than incremental improvement.

The paper tackles the problem of entangled data distributions hindering federated learning efficiency by proposing FedDistr, which uses stable diffusion models to decouple distributions. Results show it significantly enhances model utility and efficiency on CIFAR100 and DomainNet datasets, requiring only one communication round.

Federated Learning (FL) facilitates collaborative training of a global model whose performance is boosted by private data owned by distributed clients, without compromising data privacy. Yet the wide applicability of FL is hindered by entanglement of data distributions across different clients. This paper demonstrates for the first time that by disentangling data distributions FL can in principle achieve efficiencies comparable to those of distributed systems, requiring only one round of communication. To this end, we propose a novel FedDistr algorithm, which employs stable diffusion models to decouple and recover data distributions. Empirical results on the CIFAR100 and DomainNet datasets show that FedDistr significantly enhances model utility and efficiency in both disentangled and near-disentangled scenarios while ensuring privacy, outperforming traditional federated learning methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes