LG CRNov 24, 2023

Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach

Xinwei Zhang, Zhiqi Bu, Zhiwei Steven Wu, Mingyi Hong

arXiv:2311.14632v215.517 citationsh-index: 13Has Code

Originality Highly original

AI Analysis

This addresses a practical problem in training deep learning models with sensitive data by improving utility without sacrificing privacy, though it is incremental as it builds on existing DP-SGD methods.

The paper tackles the performance degradation in differentially private SGD due to gradient clipping bias by proposing an error-feedback algorithm that eliminates constant bias and allows arbitrary clipping thresholds, achieving higher accuracies on datasets like Cifar-10/100 and E2E while maintaining the same privacy guarantees.

Differentially Private Stochastic Gradient Descent with Gradient Clipping (DPSGD-GC) is a powerful tool for training deep learning models using sensitive data, providing both a solid theoretical privacy guarantee and high efficiency. However, using DPSGD-GC to ensure Differential Privacy (DP) comes at the cost of model performance degradation due to DP noise injection and gradient clipping. Existing research has extensively analyzed the theoretical convergence of DPSGD-GC, and has shown that it only converges when using large clipping thresholds that are dependent on problem-specific parameters. Unfortunately, these parameters are often unknown in practice, making it hard to choose the optimal clipping threshold. Therefore, in practice, DPSGD-GC suffers from degraded performance due to the {\it constant} bias introduced by the clipping. In our work, we propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC, which not only offers a diminishing utility bound without inducing a constant clipping bias, but more importantly, it allows for an arbitrary choice of clipping threshold that is independent of the problem. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R{é}nyi DP. Additionally, we demonstrate that under mild conditions, our algorithm can achieve nearly the same utility bound as DPSGD without gradient clipping. Our empirical results on Cifar-10/100 and E2E datasets, show that the proposed algorithm achieves higher accuracies than DPSGD while maintaining the same level of DP guarantee.

View on arXiv PDF Code

Similar