Fine-Tuning with Differential Privacy Necessitates an Additional Hyperparameter Search
This addresses the challenge of maintaining accuracy while preserving privacy in machine learning models, particularly for sensitive data, and is incremental as it builds on existing fine-tuning paradigms.
The paper tackles the problem of privacy-utility tradeoffs in differentially private fine-tuning by identifying that existing methods do not tailor fine-tuning to privacy constraints, and shows that carefully selecting layers to fine-tune achieves new state-of-the-art results, such as 77.9% accuracy on CIFAR-100 with specific privacy parameters.
Models need to be trained with privacy-preserving learning algorithms to prevent leakage of possibly sensitive information contained in their training data. However, canonical algorithms like differentially private stochastic gradient descent (DP-SGD) do not benefit from model scale in the same way as non-private learning. This manifests itself in the form of unappealing tradeoffs between privacy and utility (accuracy) when using DP-SGD on complex tasks. To remediate this tension, a paradigm is emerging: fine-tuning with differential privacy from a model pretrained on public (i.e., non-sensitive) training data. In this work, we identify an oversight of existing approaches for differentially private fine tuning. They do not tailor the fine-tuning approach to the specifics of learning with privacy. Our main result is to show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy. For instance, we achieve 77.9% accuracy for $(\varepsilon, δ)=(2, 10^{-5})$ on CIFAR-100 for a model pretrained on ImageNet. Our work calls for additional hyperparameter search to configure the differentially private fine-tuning procedure itself.