LGApr 18, 2025

Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling

Hui Yeok Wong, Chee Kau Lim, Chee Seng Chan

arXiv:2504.13462v1h-index: 4

Originality Highly original

AI Analysis

It addresses the critical problem of data heterogeneity in federated learning for applications like image classification, offering a systematic solution rather than incremental adjustments.

The paper tackles the challenge of Federated Learning on non-IID data by introducing Stratify, a framework that uses stratified sampling and label-aware client selection to manage class and feature distributions, achieving performance comparable to IID baselines and accelerating convergence across multiple datasets.

Federated Learning (FL) on non-independently and identically distributed (non-IID) data remains a critical challenge, as existing approaches struggle with severe data heterogeneity. Current methods primarily address symptoms of non-IID by applying incremental adjustments to Federated Averaging (FedAvg), rather than directly resolving its inherent design limitations. Consequently, performance significantly deteriorates under highly heterogeneous conditions, as the fundamental issue of imbalanced exposure to diverse class and feature distributions remains unresolved. This paper introduces Stratify, a novel FL framework designed to systematically manage class and feature distributions throughout training, effectively tackling the root cause of non-IID challenges. Inspired by classical stratified sampling, our approach employs a Stratified Label Schedule (SLS) to ensure balanced exposure across labels, significantly reducing bias and variance in aggregated gradients. Complementing SLS, we propose a label-aware client selection strategy, restricting participation exclusively to clients possessing data relevant to scheduled labels. Additionally, Stratify incorporates a fine-grained, high-frequency update scheme, accelerating convergence and further mitigating data heterogeneity. To uphold privacy, we implement a secure client selection protocol leveraging homomorphic encryption, enabling precise global label statistics without disclosing sensitive client information. Extensive evaluations on MNIST, CIFAR-10, CIFAR-100, Tiny-ImageNet, COVTYPE, PACS, and Digits-DG demonstrate that Stratify attains performance comparable to IID baselines, accelerates convergence, and reduces client-side computation compared to state-of-the-art methods, underscoring its practical effectiveness in realistic federated learning scenarios.

View on arXiv PDF

Similar