On Connecting Stochastic Gradient MCMC and Differential Privacy
This work addresses privacy concerns for data security in machine learning applications, offering a method to enhance confidentiality without sacrificing model performance, though it is incremental as it applies an existing algorithm to a known bottleneck.
The paper tackles the challenge of balancing privacy and utility in machine learning by showing that stochastic gradient Markov chain Monte Carlo (SG-MCMC) algorithms inherently satisfy strong differential privacy with specific step sizes, achieving state-of-the-art performance in both privacy and utility on Bayesian learning tasks.
Significant success has been realized recently on applying machine learning to real-world applications. There have also been corresponding concerns on the privacy of training data, which relates to data security and confidentiality issues. Differential privacy provides a principled and rigorous privacy guarantee on machine learning models. While it is common to design a model satisfying a required differential-privacy property by injecting noise, it is generally hard to balance the trade-off between privacy and utility. We show that stochastic gradient Markov chain Monte Carlo (SG-MCMC) -- a class of scalable Bayesian posterior sampling algorithms proposed recently -- satisfies strong differential privacy with carefully chosen step sizes. We develop theory on the performance of the proposed differentially-private SG-MCMC method. We conduct experiments to support our analysis and show that a standard SG-MCMC sampler without any modification (under a default setting) can reach state-of-the-art performance in terms of both privacy and utility on Bayesian learning.