Self-Supervised Pretraining for Differentially Private Learning
This addresses the problem of maintaining data privacy while achieving high utility in machine learning for applications like image classification, representing an incremental improvement by adapting existing self-supervised methods to DP settings.
The paper tackles the challenge of deep learning with differential privacy (DP) by proposing self-supervised pretraining (SSP) as a scalable solution for image classification, showing it improves utility over handcrafted features with limited public data and outperforms labeled training on complex datasets, achieving a 25.3% utility on a private ImageNet-1K dataset at ε=3.
We demonstrate self-supervised pretraining (SSP) is a scalable solution to deep learning with differential privacy (DP) regardless of the size of available public datasets in image classification. When facing the lack of public datasets, we show the features generated by SSP on only one single image enable a private classifier to obtain much better utility than the non-learned handcrafted features under the same privacy budget. When a moderate or large size public dataset is available, the features produced by SSP greatly outperform the features trained with labels on various complex private datasets under the same private budget. We also compared multiple DP-enabled training frameworks to train a private classifier on the features generated by SSP. Finally, we report a non-trivial utility 25.3\% of a private ImageNet-1K dataset when $ε=3$. Our source code can be found at \url{https://github.com/UnchartedRLab/SSP}.