CRLGMay 19, 2025

Optimal Client Sampling in Federated Learning with Client-Level Heterogeneous Differential Privacy

arXiv:2505.13655v22 citationsh-index: 10IEEE Internet of Things Journal
Originality Highly original
AI Analysis

This addresses the challenge of excessive noise and suboptimal performance in federated learning for clients with varying privacy needs, offering a theoretically grounded solution under an honest-but-curious attack model.

The paper tackles the problem of model utility degradation in federated learning with client-level heterogeneous differential privacy by proposing GDPFed and GDPFed+, which partition clients by privacy budgets and optimize sampling ratios, achieving substantial performance gains over state-of-the-art methods in empirical evaluations.

Federated Learning with client-level differential privacy (DP) provides a promising framework for collaboratively training models while rigorously protecting clients' privacy. However, classic approaches like DP-FedAvg struggle when clients have heterogeneous privacy requirements, as they must uniformly enforce the strictest privacy level across clients, leading to excessive DP noise and significant model utility degradation. Existing methods to improve the model utility in such heterogeneous privacy settings often assume a trusted server and are largely heuristic, resulting in suboptimal performance and lacking strong theoretical underpinnings. In this work, we address these challenges under a practical attack model where both clients and the server are honest-but-curious. We propose GDPFed, which partitions clients into groups based on their privacy budgets and achieves client-level DP within each group to reduce the privacy budget waste and hence improve the model utility. Based on the privacy and convergence analysis of GDPFed, we find that the magnitude of DP noise depends on both model dimensionality and the per-group client sampling ratios. To further improve the performance of GDPFed, we introduce GDPFed$^+$, which integrates model sparsification to eliminate unnecessary noise and optimizes per-group client sampling ratios to minimize convergence error. Extensive empirical evaluations on multiple benchmark datasets demonstrate the effectiveness of GDPFed$^+$, showing substantial performance gains compared with state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes