CR DS LG MLDec 13, 2024

Differentially Private Multi-Sampling from Distributions

arXiv:2412.10512v12 citationsh-index: 2ALT

Originality Incremental advance

AI Analysis

This addresses the need for synthetic data in exploratory analysis under privacy constraints, offering incremental advances over existing DP sampling methods.

The paper tackles the problem of generating multiple synthetic samples from a distribution under differential privacy (DP), extending prior work on single-sample DP generation. It improves sample complexity by a factor of m for finite domains and enables pure DP sampling for Gaussians with known covariance, with concrete results like factor-of-m improvements and new lower bounds.

Many algorithms have been developed to estimate probability distributions subject to differential privacy (DP): such an algorithm takes as input independent samples from a distribution and estimates the density function in a way that is insensitive to any one sample. A recent line of work, initiated by Raskhodnikova et al. (Neurips '21), explores a weaker objective: a differentially private algorithm that approximates a single sample from the distribution. Raskhodnikova et al. studied the sample complexity of DP \emph{single-sampling} i.e., the minimum number of samples needed to perform this task. They showed that the sample complexity of DP single-sampling is less than the sample complexity of DP learning for certain distribution classes. We define two variants of \emph{multi-sampling}, where the goal is to privately approximate $m>1$ samples. This better models the realistic scenario where synthetic data is needed for exploratory data analysis. A baseline solution to \emph{multi-sampling} is to invoke a single-sampling algorithm $m$ times on independently drawn datasets of samples. When the data comes from a finite domain, we improve over the baseline by a factor of $m$ in the sample complexity. When the data comes from a Gaussian, Ghazi et al. (Neurips '23) show that \emph{single-sampling} can be performed under approximate differential privacy; we show it is possible to \emph{single- and multi-sample Gaussians with known covariance subject to pure DP}. Our solution uses a variant of the Laplace mechanism that is of independent interest. We also give sample complexity lower bounds, one for strong multi-sampling of finite distributions and another for weak multi-sampling of bounded-covariance Gaussians.

View on arXiv PDF

Similar