CRMay 1, 2019
The Podium Mechanism: Improving on the Laplace and Staircase MechanismsVasyl Pihur
The Podium mechanism guarantees ($ε, 0$)-differential privacy by sampling noise from a \emph{finite} mixture of three uniform distributions. By carefully constructing such a mixture distribution, we trivially guarantee privacy properties, while minimizing the variance of the noise added to our continuous outcome. Our gains in variance control are due to the "truncated" nature of the Podium mechanism where support for the noise distribution is maintained as close as possible to the sensitivity of our data collection, unlike the \emph{infinite} support that characterizes both the Laplace and Staircase mechanisms. In a high-privacy regime ($ε< 1$), the Podium mechanism outperforms the other two by 50-70\% in terms of the noise variance reduction, while in a low privacy regime ($ε\to \infty$), it asymptotically approaches the Staircase mechanism.
CRJul 11, 2018
Differentially-Private "Draw and Discard" Machine LearningVasyl Pihur, Aleksandra Korolova, Frederick Liu et al.
In this work, we propose a novel framework for privacy-preserving client-distributed machine learning. It is motivated by the desire to achieve differential privacy guarantees in the local model of privacy in a way that satisfies all systems constraints using asynchronous client-server communication and provides attractive model learning properties. We call it "Draw and Discard" because it relies on random sampling of models for load distribution (scalability), which also provides additional server-side privacy protections and improved model quality through averaging. We present the mechanics of client and server components of "Draw and Discard" and demonstrate how the framework can be applied to learning Generalized Linear models. We then analyze the privacy guarantees provided by our approach against several types of adversaries and showcase experimental results that provide evidence for the framework's viability in practical deployments.
CRMar 4, 2015
Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data DictionariesGiulia Fanti, Vasyl Pihur, Úlfar Erlingsson
Techniques based on randomized response enable the collection of potentially sensitive data from clients in a privacy-preserving manner with strong local differential privacy guarantees. One of the latest such technologies, RAPPOR, allows the marginal frequencies of an arbitrary set of strings to be estimated via privacy-preserving crowdsourcing. However, this original estimation process requires a known set of possible strings; in practice, this dictionary can often be extremely large and sometimes completely unknown. In this paper, we propose a novel decoding algorithm for the RAPPOR mechanism that enables the estimation of "unknown unknowns," i.e., strings we do not even know we should be estimating. To enable learning without explicit knowledge of the dictionary, we develop methodology for estimating the joint distribution of two or more variables collected with RAPPOR. This is a critical step towards understanding relationships between multiple variables collected in a privacy-preserving manner.
CRJul 25, 2014
RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal ResponseÚlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova
Randomized Aggregatable Privacy-Preserving Ordinal Response, or RAPPOR, is a technology for crowdsourcing statistics from end-user client software, anonymously, with strong privacy guarantees. In short, RAPPORs allow the forest of client data to be studied, without permitting the possibility of looking at individual trees. By applying randomized response in a novel manner, RAPPOR provides the mechanisms for such collection as well as for efficient, high-utility analysis of the collected data. In particular, RAPPOR permits statistics to be collected on the population of client-side strings with strong privacy guarantees for each client, and without linkability of their reports. This paper describes and motivates RAPPOR, details its differential-privacy and utility guarantees, discusses its practical deployment and properties in the face of different attack models, and, finally, gives results of its application to both synthetic and real-world data.