Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy
This work addresses privacy concerns in machine learning for data analysts by providing a tractable method to avoid unbounded privacy breaches in approximate DP, though it is incremental in combining existing techniques.
The paper tackles the problem of maintaining pure differential privacy guarantees when using approximate MCMC sampling by introducing the ASAP algorithm, which perturbs MCMC samples based on Wasserstein-infinity distance and achieves optimal rates for DP-ERM with strongly convex and smooth losses in nearly linear time.
Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $\varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(\varepsilon,δ)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing $δ$-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity ($W_\infty$) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e., $δ=0$). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in $W_\infty$ distance. We show that by combining our new techniques with a localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.