Nicolas Ruiz

CR
3papers
14citations
Novelty62%
AI Score25

3 Papers

CRDec 7, 2017
A general cipher for individual data anonymization

Nicolas Ruiz

Over the years, the literature on individual data anonymization has burgeoned in many directions. Borrowing from several areas of other sciences, the current diversity of concepts, models and tools available contributes to understanding and fostering individual data dissemination in a privacy-preserving way, as well as unleashing new sources of information for the benefits of society at large. However, such diversity doesn't come without some difficulties. Currently, the task of selecting the optimal analytical environment to conduct anonymization is complicated by the multitude of available choices. Based on recent contributions from the literature and inspired by cryptography, this paper proposes the first cipher for data anonymization. The functioning of this cipher shows that, in fact, every anonymization method can be viewed as a general form of rank swapping with unconstrained permutation structures. Beyond all the currently existing methods that it can mimic, this cipher offers a new way to practice data anonymization, notably by performing anonymization in an ex ante way, instead of being engaged in several ex post evaluations and iterations to reach the protection and information properties sought after. Moreover, the properties of this cipher point to some previously unknown general insights into the task of data anonymization considered at a general level of functioning. Finally, and to make the cipher operational, this paper proposes the introduction of permutation menus in data anonymization, where recently developed universal measures of disclosure risk and information loss are used ex ante for the calibration of permutation keys. To justify the relevance of their uses, a theoretical characterization of these measures is also proposed.

CRDec 7, 2017
A multiplicative masking method for preserving the skewness of the original micro-records

Nicolas Ruiz

Masking methods for the safe dissemination of microdata consist of distorting the original data while preserving a pre-defined set of statistical properties in the microdata. For continuous variables, available methodologies rely essentially on matrix masking and in particular on adding noise to the original values, using more or less refined procedures depending on the extent of information that one seeks to preserve. Almost all of these methods make use of the critical assumption that the original datasets follow a normal distribution and/or that the noise has such a distribution. This assumption is, however, restrictive in the sense that few variables follow empirically a Gaussian pattern: the distribution of household income, for example, is positively skewed, and this skewness is essential information that has to be considered and preserved. This paper addresses these issues by presenting a simple multiplicative masking method that preserves skewness of the original data while offering a sufficient level of disclosure risk control. Numerical examples are provided, leading to the suggestion that this method could be well-suited for the dissemination of a broad range of microdata, including those based on administrative and business records.

CRJan 29, 2017
On some consequences of the permutation paradigm for data anonymization: centrality of permutation matrices, universal measures of disclosure risk and information loss, evaluation by dominance

Nicolas Ruiz

Recently, the permutation paradigm has been proposed in data anonymization to describe any micro data masking method as permutation, paving the way for performing meaningful analytical comparisons of methods, something that is difficult currently in statistical disclosure control research. This paper explores some consequences of this paradigm by establishing some class of universal measures of disclosure risk and information loss that can be used for the evaluation and comparison of any method, under any parametrization and independently of the characteristics of the data to be anonymized. These measures lead to the introduction in data anonymization of the concepts of dominance in disclosure risk and information loss, which formalise the fact that different parties involved in micro data transaction can all have different sensitivities to privacy and information.