Private Posterior distributions from Variational approximations
This work addresses the challenge of performing valid statistical inference for researchers and practitioners using private data, though it is incremental as it builds on existing variational methods for a specific model.
The paper tackles the problem of inaccurate statistical inference from differentially private data by incorporating the privacy mechanism into the likelihood, treating original data as missing, and develops variational approximations to estimate posterior distributions for naive Bayes log-linear model parameters, showing in simulations that these approximations outperform naive methods that ignore the noise.
Privacy preserving mechanisms such as differential privacy inject additional randomness in the form of noise in the data, beyond the sampling mechanism. Ignoring this additional noise can lead to inaccurate and invalid inferences. In this paper, we incorporate the privacy mechanism explicitly into the likelihood function by treating the original data as missing, with an end goal of estimating posterior distributions over model parameters. This leads to a principled way of performing valid statistical inference using private data, however, the corresponding likelihoods are intractable. In this paper, we derive fast and accurate variational approximations to tackle such intractable likelihoods that arise due to privacy. We focus on estimating posterior distributions of parameters of the naive Bayes log-linear model, where the sufficient statistics of this model are shared using a differentially private interface. Using a simulation study, we show that the posterior approximations outperform the naive method of ignoring the noise addition mechanism.