LG CR DB ITMay 24, 2023

Post-processing Private Synthetic Data for Improving Utility on Selected Measures

Hao Wang, Shivchander Sudalairaj, John Henning, Kristjan Greenewald, Akash Srivastava

arXiv:2305.15538v210.79 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses the issue for users of private synthetic data who need specific utility measures, representing an incremental improvement by enhancing existing methods.

The paper tackles the problem of private synthetic data generation being agnostic to downstream tasks by introducing a post-processing technique that resamples synthetic data to improve utility on user-selected measures while preserving privacy and quality, demonstrating consistent utility improvements across multiple benchmark datasets and state-of-the-art algorithms.

Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end users may have specific requirements that the synthetic data must satisfy. Failure to meet these requirements could significantly reduce the utility of the data for downstream use. We introduce a post-processing technique that improves the utility of the synthetic data with respect to measures selected by the end user, while preserving strong privacy guarantees and dataset quality. Our technique involves resampling from the synthetic data to filter out samples that do not meet the selected utility measures, using an efficient stochastic first-order algorithm to find optimal resampling weights. Through comprehensive numerical experiments, we demonstrate that our approach consistently improves the utility of synthetic data across multiple benchmark datasets and state-of-the-art synthetic data generation algorithms.

View on arXiv PDF

Similar