CRJun 18, 2021
Non-parametric Differentially Private Confidence Intervals for the MedianJoerg Drechsler, Ira Globus-Harris, Audra McMillan et al.
Differential privacy is a restriction on data processing algorithms that provides strong confidentiality guarantees for individual records in the data. However, research on proper statistical inference, that is, research on properly quantifying the uncertainty of the (noisy) sample estimate regarding the true value in the population, is currently still limited. This paper proposes and evaluates several strategies to compute valid differentially private confidence intervals for the median. Instead of computing a differentially private point estimate and deriving its uncertainty, we directly estimate the interval bounds and discuss why this approach is superior if ensuring privacy is important. We also illustrate that addressing both sources of uncertainty--the error from sampling and the error from protecting the output--simultaneously should be preferred over simpler approaches that incorporate the uncertainty in a sequential fashion. We evaluate the performance of the different algorithms under various parameter settings in extensive simulation studies and demonstrate how the findings could be applied in practical settings using data from the 1940 Decennial Census.
APMar 17, 2021
Accuracy Gains from Privacy Amplification Through Sampling for Differential PrivacyJingchen Hu, Joerg Drechsler, Hang J. Kim
Recent research in differential privacy demonstrated that (sub)sampling can amplify the level of protection. For example, for $ε$-differential privacy and simple random sampling with sampling rate $r$, the actual privacy guarantee is approximately $rε$, if a value of $ε$ is used to protect the output from the sample. In this paper, we study whether this amplification effect can be exploited systematically to improve the accuracy of the privatized estimate. Specifically, assuming the agency has information for the full population, we ask under which circumstances accuracy gains could be expected, if the privatized estimate would be computed on a random sample instead of the full population. We find that accuracy gains can be achieved for certain regimes. However, gains can typically only be expected, if the sensitivity of the output with respect to small changes in the database does not depend too strongly on the size of the database. We only focus on algorithms that achieve differential privacy by adding noise to the final output and illustrate the accuracy implications for two commonly used statistics: the mean and the median. We see our research as a first step towards understanding the conditions required for accuracy gains in practice and we hope that these findings will stimulate further research broadening the scope of differential privacy algorithms and outputs considered.
CRFeb 17, 2021
Differential Privacy for Government Agencies -- Are We There Yet?Joerg Drechsler
Government agencies typically need to take potential risks of disclosure into account whenever they publish statistics based on their data or give external researchers access to collected data. In this context, the promise of formal privacy guarantees offered by concepts such as differential privacy seems to be the panacea enabling the agencies to quantify and control the privacy loss incurred by any data release exactly. Nevertheless, despite the excitement in academia and industry, most agencies -- with the prominent exception of the U.S. Census Bureau -- have been reluctant to even consider the concept for their data release strategy. This paper discusses potential reasons for this. We argue that the requirements for implementing differential privacy approaches at government agencies are often fundamentally different from the requirements in industry. This raises many challenges and questions that still need to be addressed before the concept can be used as an overarching principle when sharing data with the public. The paper does not offer any solutions to these challenges. Instead, we hope to stimulate some collaborative research efforts, as we believe that many of the problems can only be addressed by interdisciplinary collaborations.