CRMar 4, 2015

Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries

arXiv:1503.01214v1315 citations
Originality Highly original
AI Analysis

This work addresses a critical limitation in privacy-preserving data collection for applications like crowdsourcing, enabling more flexible analysis of associations between variables.

The paper tackles the problem of estimating unknown strings and joint distributions from data collected via the RAPPOR mechanism, which originally required a known dictionary, by proposing a novel decoding algorithm that enables learning without explicit knowledge of the dictionary.

Techniques based on randomized response enable the collection of potentially sensitive data from clients in a privacy-preserving manner with strong local differential privacy guarantees. One of the latest such technologies, RAPPOR, allows the marginal frequencies of an arbitrary set of strings to be estimated via privacy-preserving crowdsourcing. However, this original estimation process requires a known set of possible strings; in practice, this dictionary can often be extremely large and sometimes completely unknown. In this paper, we propose a novel decoding algorithm for the RAPPOR mechanism that enables the estimation of "unknown unknowns," i.e., strings we do not even know we should be estimating. To enable learning without explicit knowledge of the dictionary, we develop methodology for estimating the joint distribution of two or more variables collected with RAPPOR. This is a critical step towards understanding relationships between multiple variables collected in a privacy-preserving manner.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes