MLLGFeb 13, 2024

Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

arXiv:2402.08344v13 citationsh-index: 4AISTATS
Originality Incremental advance
AI Analysis

This addresses a challenge in differentially private deep learning where strong privacy requires large batches, but performance degrades, offering insights for potential improvements.

The paper investigates why large-batch training in Differentially Private SGD (DP-SGD) leads to poor performance, showing that the implicit bias from noise in SGD is amplified by added Gaussian noise, not clipping, and analyzing this in linear settings.

Training Deep Neural Networks (DNNs) with small batches using Stochastic Gradient Descent (SGD) yields superior test performance compared to larger batches. The specific noise structure inherent to SGD is known to be responsible for this implicit bias. DP-SGD, used to ensure differential privacy (DP) in DNNs' training, adds Gaussian noise to the clipped gradients. Surprisingly, large-batch training still results in a significant decrease in performance, which poses an important challenge because strong DP guarantees necessitate the use of massive batches. We first show that the phenomenon extends to Noisy-SGD (DP-SGD without clipping), suggesting that the stochasticity (and not the clipping) is the cause of this implicit bias, even with additional isotropic Gaussian noise. We theoretically analyse the solutions obtained with continuous versions of Noisy-SGD for the Linear Least Square and Diagonal Linear Network settings, and reveal that the implicit bias is indeed amplified by the additional noise. Thus, the performance issues of large-batch DP-SGD training are rooted in the same underlying principles as SGD, offering hope for potential improvements in large batch training strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes