CRLGNov 15, 2024

To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

arXiv:2411.10614v211 citationsh-index: 55
Originality Incremental advance
AI Analysis

This work highlights a critical risk for practitioners using differential privacy in machine learning, as it reveals significant gaps in theoretical guarantees, making it an incremental but important audit of existing methods.

The paper tackled the problem of overestimated privacy guarantees in DP-SGD when shuffling replaces Poisson sub-sampling, showing that actual privacy leakage can be up to 4 times higher than reported, with variations reaching up to 10 times.

The Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm allows the training of machine learning (ML) models with formal Differential Privacy (DP) guarantees. Since DP-SGD processes training data in batches, it employs Poisson sub-sampling to select each batch at every step. However, it has become common practice to replace sub-sampling with shuffling owing to better compatibility and computational overhead. At the same time, we do not know how to compute tight theoretical guarantees for shuffling; thus, DP guarantees of models privately trained with shuffling are often reported as though Poisson sub-sampling was used. This prompts the need to verify whether gaps exist between the theoretical DP guarantees reported by state-of-the-art models and their actual leakage. To do so, we introduce a novel DP auditing procedure to analyze DP-SGD with shuffling and show that DP models trained with this approach have considerably overestimated privacy guarantees (up to 4 times). In the process, we assess the impact on privacy leakage of several parameters, including batch size, privacy budget, and threat model. Finally, we study two common variations of the shuffling procedure that result in even further privacy leakage (up to 10 times). Overall, our work attests to the risk of using shuffling instead of Poisson sub-sampling vis-à-vis privacy leakage from DP-SGD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes