CR LGApr 12, 2023

Measuring Re-identification Risk

CJ Carey, Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Shankar Kumar, Andres Munoz Medina, Vahab Mirrokni, Gabriel Henrique Nunes, Sergei Vassilvitskii, Peilin Zhong

arXiv:2304.07210v210.520 citationsh-index: 62

Originality Incremental advance

AI Analysis

This work addresses privacy concerns for users of personalization services by providing a rigorous method to assess re-identification risk, which is incremental in offering a formal framework for an existing problem.

The paper tackles the problem of measuring re-identification risk in user representations like embeddings, presenting a theoretical framework based on hypothesis testing that formally bounds the probability an attacker can identify a user, and applies it to real-world scenarios such as the Chrome's Topics API.

Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

View on arXiv PDF

Similar