Distribution-Invariant Differential Privacy
This addresses the problem of data analysis distortion for users of differentially private data, though it appears incremental as it builds on existing privacy frameworks.
The paper tackles the trade-off between privacy protection and statistical accuracy in differential privacy by developing a distribution-invariant privatization method, achieving superior statistical accuracy under strict privacy in simulations and real-world benchmarks.
Differential privacy is becoming one gold standard for protecting the privacy of publicly shared data. It has been widely used in social science, data science, public health, information technology, and the U.S. decennial census. Nevertheless, to guarantee differential privacy, existing methods may unavoidably alter the conclusion of the original data analysis, as privatization often changes the sample distribution. This phenomenon is known as the trade-off between privacy protection and statistical accuracy. In this work, we mitigate this trade-off by developing a distribution-invariant privatization (DIP) method to reconcile both high statistical accuracy and strict differential privacy. As a result, any downstream statistical or machine learning task yields essentially the same conclusion as if one used the original data. Numerically, under the same strictness of privacy protection, DIP achieves superior statistical accuracy in a wide range of simulation studies and real-world benchmarks.