LGFeb 5, 2025

The Cost of Shuffling in Private Gradient Based Optimization

Shuli Jiang, Pranay Sharma, Zhiwei Steven Wu, Gauri Joshi

arXiv:2502.03652v17.11 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the gap between theory and practice in private optimization for machine learning practitioners, though it appears incremental as it builds on existing DP-SGD and shuffling methods.

The paper tackles the problem of differentially private convex empirical risk minimization, showing that data shuffling in private gradient methods (DP-ShuffleG) results in worse empirical excess risk compared to DP-SGD, with a proposed hybrid approach (Interleaved-ShuffleG) that reduces this risk by integrating public data samples.

We consider the problem of differentially private (DP) convex empirical risk minimization (ERM). While the standard DP-SGD algorithm is theoretically well-established, practical implementations often rely on shuffled gradient methods that traverse the training data sequentially rather than sampling with replacement in each iteration. Despite their widespread use, the theoretical privacy-accuracy trade-offs of private shuffled gradient methods (\textit{DP-ShuffleG}) remain poorly understood, leading to a gap between theory and practice. In this work, we leverage privacy amplification by iteration (PABI) and a novel application of Stein's lemma to provide the first empirical excess risk bound of \textit{DP-ShuffleG}. Our result shows that data shuffling results in worse empirical excess risk for \textit{DP-ShuffleG} compared to DP-SGD. To address this limitation, we propose \textit{Interleaved-ShuffleG}, a hybrid approach that integrates public data samples in private optimization. By alternating optimization steps that use private and public samples, \textit{Interleaved-ShuffleG} effectively reduces empirical excess risk. Our analysis introduces a new optimization framework with surrogate objectives, adaptive noise injection, and a dissimilarity metric, which can be of independent interest. Our experiments on diverse datasets and tasks demonstrate the superiority of \textit{Interleaved-ShuffleG} over several baselines.

View on arXiv PDF

Similar