LGMay 9, 2022

Evaluating the Fairness Impact of Differentially Private Synthetic Data

Blake Bullwinkel, Kristen Grabarz, Lily Ke, Scarlett Gong, Chris Tanner, Joshua Allen

arXiv:2205.04321v28.714 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses fairness concerns in privacy-preserving data synthesis for sensitive datasets, though it is incremental as it builds on existing DP methods.

The study investigated how differentially private synthetic data affects fairness in downstream classification tasks, finding that three out of four models often degrade fairness outcomes, but pre-processing with multi-label undersampling can improve fairness without reducing accuracy.

Differentially private (DP) synthetic data is a promising approach to maximizing the utility of data containing sensitive information. Due to the suppression of underrepresented classes that is often required to achieve privacy, however, it may be in conflict with fairness. We evaluate four DP synthesizers and present empirical results indicating that three of these models frequently degrade fairness outcomes on downstream binary classification tasks. We draw a connection between fairness and the proportion of minority groups present in the generated synthetic data, and find that training synthesizers on data that are pre-processed via a multi-label undersampling method can promote more fair outcomes without degrading accuracy.

View on arXiv PDF

Similar