LGMLSep 7, 2020

Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

arXiv:2009.03106v177 citations
Originality Incremental advance
AI Analysis

This addresses a key bottleneck for researchers and practitioners in privacy-preserving machine learning by enabling faster experimentation with differentially private models.

The paper tackled the slow training time of differentially private deep learning by developing new methods for per-example gradient clipping that improve GPU utilization, achieving speed-ups of 54x to 94x for various models with batch sizes of 128.

Recent work on Renyi Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, however, differentially private deep networks often lag far behind their non-private counterparts in accuracy, showing the need for more research in model architectures, optimizers, etc. One of the barriers to this expanded research is the training time -- often orders of magnitude larger than training non-private networks. The reason for this slowdown is a crucial privacy-related step called "per-example gradient clipping" whose naive implementation undoes the benefits of batch training with GPUs. By analyzing the back-propagation equations we derive new methods for per-example gradient clipping that are compatible with auto-differentiation (e.g., in PyTorch and TensorFlow) and provide better GPU utilization. Our implementation in PyTorch showed significant training speed-ups (by factors of 54x - 94x for training various models with batch sizes of 128). These techniques work for a variety of architectural choices including convolutional layers, recurrent networks, attention, residual blocks, etc.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes