LG CV CYAug 21, 2023

Unlocking Accuracy and Fairness in Differentially Private Image Classification

Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle

DeepMindStanford

arXiv:2308.10888v117.022 citationsh-index: 118Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of making DP training practical for machine learning practitioners to use on sensitive datasets, representing a milestone rather than an incremental step.

The paper tackled the problem of reduced accuracy and fairness in differentially private (DP) image classification, showing that fine-tuning pre-trained foundation models with DP can achieve accuracies within a few percent of non-private state-of-the-art models across datasets, including medical imaging benchmarks, without increasing performance disparities across demographic groups.

Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal privacy guarantees. However, compared to their non-private counterparts, models trained with DP often have significantly reduced accuracy. Private classifiers are also believed to exhibit larger performance disparities across subpopulations, raising fairness concerns. The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry. Here we show that pre-trained foundation models fine-tuned with DP can achieve similar accuracy to non-private classifiers, even in the presence of significant distribution shifts between pre-training data and downstream tasks. We achieve private accuracies within a few percent of the non-private state of the art across four datasets, including two medical imaging benchmarks. Furthermore, our private medical classifiers do not exhibit larger performance disparities across demographic groups than non-private models. This milestone to make DP training a practical and reliable technology has the potential to widely enable machine learning practitioners to train safely on sensitive datasets while protecting individuals' privacy.

View on arXiv PDF Code

Similar