CRMar 4, 2015

Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries

Giulia Fanti, Vasyl Pihur, Úlfar Erlingsson

arXiv:1503.01214v134.7315 citations

Originality Highly original

AI Analysis

This work addresses a critical limitation in privacy-preserving data collection for applications like crowdsourcing, enabling more flexible analysis of associations between variables.

The paper tackles the problem of estimating unknown strings and joint distributions from data collected via the RAPPOR mechanism, which originally required a known dictionary, by proposing a novel decoding algorithm that enables learning without explicit knowledge of the dictionary.

Techniques based on randomized response enable the collection of potentially sensitive data from clients in a privacy-preserving manner with strong local differential privacy guarantees. One of the latest such technologies, RAPPOR, allows the marginal frequencies of an arbitrary set of strings to be estimated via privacy-preserving crowdsourcing. However, this original estimation process requires a known set of possible strings; in practice, this dictionary can often be extremely large and sometimes completely unknown. In this paper, we propose a novel decoding algorithm for the RAPPOR mechanism that enables the estimation of "unknown unknowns," i.e., strings we do not even know we should be estimating. To enable learning without explicit knowledge of the dictionary, we develop methodology for estimating the joint distribution of two or more variables collected with RAPPOR. This is a critical step towards understanding relationships between multiple variables collected in a privacy-preserving manner.

View on arXiv PDF

Similar