Jordi Soria-Comas

h-index17

4papers

277citations

Novelty45%

AI Score24

Ranked #171,085 of 194,257 authors (top 88%)#5,050 in CR (top 75%)

4 Papers

7.2CROct 21, 2020

Multi-Dimensional Randomized Response

Josep Domingo-Ferrer, Jordi Soria-Comas

In our data world, a host of not necessarily trusted controllers gather data on individual subjects. To preserve her privacy and, more generally, her informational self-determination, the individual has to be empowered by giving her agency on her own data. Maximum agency is afforded by local anonymization, that allows each individual to anonymize her own data before handing them to the data controller. Randomized response (RR) is a local anonymization approach able to yield multi-dimensional full sets of anonymized microdata that are valid for exploratory analysis and machine learning. This is so because an unbiased estimate of the distribution of the true data of individuals can be obtained from their pooled randomized data. Furthermore, RR offers rigorous privacy guarantees. The main weakness of RR is the curse of dimensionality when applied to several attributes: as the number of attributes grows, the accuracy of the estimated true data distribution quickly degrades. We propose several complementary approaches to mitigate the dimensionality problem. First, we present two basic protocols, separate RR on each attribute and joint RR for all attributes, and discuss their limitations. Then we introduce an algorithm to form clusters of attributes so that attributes in different clusters can be viewed as independent and joint RR can be performed within each cluster. After that, we introduce an adjustment algorithm for the randomized data set that repairs some of the accuracy loss due to assuming independence between attributes when using RR separately on each attribute or due to assuming independence between clusters in cluster-wise RR. We also present empirical work to illustrate the proposed methods.

4.2CRMar 6, 2018

Connecting Randomized Response, Post-Randomization, Differential Privacy and t-Closeness via Deniability and Permutation

Josep Domingo-Ferrer, Jordi Soria-Comas

We explore some novel connections between the main privacy models in use and we recall a few known ones. We show these models to be more related than commonly understood, around two main principles: deniability and permutation. In particular, randomized response turns out to be very modern in spite of it having been introduced over 50 years ago: it is a local anonymization method and it allows understanding the protection offered by $ε$-differential privacy when $ε$ is increased to improve utility. A similar understanding on the effect of large $ε$ in terms of deniability is obtained from the connection between $ε$-differential privacy and t-closeness. Finally, the post-randomization method (PRAM) is shown to be viewable as permutation and to be connected with randomized response and differential privacy. Since the latter is also connected with t-closeness, it follows that the permutation principle can explain the guarantees offered by all those models. Thus, calibrating permutation is very relevant in anonymization, and we conclude by sketching two ways of doing it.

24.0CRDec 7, 2016

Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees

Jordi Soria-Comas, Josep Domingo-Ferrer, David Sánchez et al.

Differential privacy is a popular privacy model within the research community because of the strong privacy guarantee it offers, namely that the presence or absence of any individual in a data set does not significantly influence the results of analyses on the data set. However, enforcing this strict guarantee in practice significantly distorts data and/or limits data uses, thus diminishing the analytical utility of the differentially private results. In an attempt to address this shortcoming, several relaxations of differential privacy have been proposed that trade off privacy guarantees for improved data utility. In this work, we argue that the standard formalization of differential privacy is stricter than required by the intuitive privacy guarantee it seeks. In particular, the standard formalization requires indistinguishability of results between any pair of neighbor data sets, while indistinguishability between the actual data set and its neighbor data sets should be enough. This limits the data controller's ability to adjust the level of protection to the actual data, hence resulting in significant accuracy loss. In this respect, we propose individual differential privacy, an alternative differential privacy notion that offers em the same privacy guarantees as standard differential privacy to individuals (even though not to groups of individuals). This new notion allows the data controller to adjust the distortion to the actual data set, which results in less distortion and more analytical accuracy. We propose several mechanisms to attain individual differential privacy and we compare the new notion against standard differential privacy in terms of the accuracy of the analytical results.

17.1CRDec 16, 2015

From t-closeness to differential privacy and vice versa in data anonymization

J. Domingo-Ferrer, J. Soria-Comas

k-Anonymity and ε-differential privacy are two mainstream privacy models, the former introduced to anonymize data sets and the latter to limit the knowledge gain that results from including one individual in the data set. Whereas basic k-anonymity only protects against identity disclosure, t-closeness was presented as an extension of k-anonymity that also protects against attribute disclosure. We show here that, if not quite equivalent, t-closeness and ε-differential privacy are strongly related to one another when it comes to anonymizing data sets. Specifically, k-anonymity for the quasi-identifiers combined with ε-differential privacy for the confidential attributes yields stochastic t-closeness (an extension of t-closeness), with t a function of k and ε. Conversely, t-closeness can yield ε- differential privacy when t = exp(ε/2) and the assumptions made by t-closeness about the prior and posterior views of the data hold