Simulation-based Bayesian Inference from Privacy Protected Data
This work addresses the problem of biased statistical analysis for researchers and practitioners using differentially private data, though it is incremental as it builds on existing privacy and inference techniques.
The paper tackles the challenge of making valid statistical inferences from privacy-protected data, which often introduces biases due to noise injection. It proposes simulation-based inference methods, including sequential Monte Carlo and neural conditional density estimators, demonstrating feasibility and correcting biases in experiments with infectious disease models and linear regression.
Many modern statistical analysis and machine learning applications require training models on sensitive user data. Under a formal definition of privacy protection, differentially private algorithms inject calibrated noise into the confidential data or during the data analysis process to produce privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid statistical inferences. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we adopt neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.