CRJul 16, 2022
Sotto Voce: Federated Speech Recognition with Differential Privacy GuaranteesMichael Shoemate, Kevin Jett, Ethan Cowan et al.
Speech data is expensive to collect, and incredibly sensitive to its sources. It is often the case that organizations independently collect small datasets for their own use, but often these are not performant for the demands of machine learning. Organizations could pool these datasets together and jointly build a strong ASR system; sharing data in the clear, however, comes with tremendous risk, in terms of intellectual property loss as well as loss of privacy of the individuals who exist in the dataset. In this paper, we offer a potential solution for learning an ML model across multiple organizations where we can provide mathematical guarantees limiting privacy loss. We use a Federated Learning approach built on a strong foundation of Differential Privacy techniques. We apply these to a senone classification prototype and demonstrate that the model improves with the addition of private data while still respecting privacy.
CRJan 12, 2021Code
Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale DeploymentMahnush Movahedi, Benjamin M. Case, Andrew Knox et al.
In this paper, we outline a way to deploy a privacy-preserving protocol for multiparty Randomized Controlled Trials on the scale of 500 million rows of data and more than a billion gates. Randomized Controlled Trials (RCTs) are widely used to improve business and policy decisions in various sectors such as healthcare, education, criminology, and marketing. A Randomized Controlled Trial is a scientifically rigorous method to measure the effectiveness of a treatment. This is accomplished by randomly allocating subjects to two or more groups, treating them differently, and then comparing the outcomes across groups. In many scenarios, multiple parties hold different parts of the data for conducting and analyzing RCTs. Given privacy requirements and expectations of each of these parties, it is often challenging to have a centralized store of data to conduct and analyze RCTs. We accomplish this by a three-stage solution. The first stage uses the Private Secret Share Set Intersection (PS$^3$I) solution to create a joined set and establish secret shares without revealing membership, while discarding individuals who were placed into more than one group. The second stage runs multiple instances of a general purpose MPC over a sharded database to aggregate statistics about each experimental group while discarding individuals who took an action before they received treatment. The third stage adds distributed and calibrated Differential Privacy (DP) noise to the aggregate statistics and uncertainty measures, providing formal two-sided privacy guarantees. We also evaluate the performance of multiple open source general purpose MPC libraries for this task. We additionally demonstrate how we have used this to create a working ads effectiveness measurement product capable of measuring hundreds of millions of individuals per experiment.
MEOct 27, 2021
Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential PrivacyChristian Covington, Xi He, James Honaker et al.
We present a method for producing unbiased parameter estimates and valid confidence intervals under the constraints of differential privacy, a formal framework for limiting individual information leakage from sensitive data. Prior work in this area is limited in that it is tailored to calculating confidence intervals for specific statistical procedures, such as mean estimation or simple linear regression. While other recent work can produce confidence intervals for more general sets of procedures, they either yield only approximately unbiased estimates, are designed for one-dimensional outputs, or assume significant user knowledge about the data-generating distribution. Our method induces distributions of mean and covariance estimates via the bag of little bootstraps (BLB) and uses them to privately estimate the parameters' sampling distribution via a generalized version of the CoinPress estimation algorithm. If the user can bound the parameters of the BLB-induced parameters and provide heavier-tailed families, the algorithm produces unbiased parameter estimates and valid confidence intervals which hold with arbitrarily high probability. These results hold in high dimensions and for any estimation procedure which behaves nicely under the bootstrap.
CROct 15, 2021
The Privacy-preserving Padding Problem: Non-negative Mechanisms for Conservative Answers with Differential PrivacyBenjamin M. Case, James Honaker, Mahnush Movahedi
Differentially private noise mechanisms commonly use symmetric noise distributions. This is attractive both for achieving the differential privacy definition, and for unbiased expectations in the noised answers. However, there are contexts in which a noisy answer only has utility if it is conservative, that is, has known-signed error, which we call a padded answer. Seemingly, it is paradoxical to satisfy the DP definition with one-sided error, but we show how it is possible to bury the paradox into approximate DP's delta parameter. We develop a few mechanisms for one-sided padding mechanisms that always give conservative answers, but still achieve approximate differential privacy. We show how these mechanisms can be applied in a few select areas including making the cardinalities of set intersections and unions revealed in Private Set Intersection protocols differential private and enabling multiparty computation protocols to compute on sparse data which has its exact sizes made differential private rather than performing a fully oblivious more expensive computation.
CRSep 14, 2016
PSI (Ψ): a Private data Sharing InterfaceMarco Gaboardi, James Honaker, Gary King et al.
We provide an overview of PSI ("a Private data Sharing Interface"), a system we are developing to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy.