Martin Kroll

CR
4papers
85citations
Novelty55%
AI Score25

4 Papers

STJul 14, 2019
Pointwise adaptive kernel density estimation under local approximate differential privacy

Martin Kroll

We consider non-parametric density estimation in the framework of local approximate differential privacy. In contrast to centralized privacy scenarios with a trusted curator, in the local setup anonymization must be guaranteed already on the individual data owners' side and therefore must precede any data mining tasks. Thus, the published anonymized data should be compatible with as many statistical procedures as possible. We suggest adding Laplace noise and Gaussian processes (both appropriately scaled) to kernel density estimators to obtain approximate differential private versions of the latter ones. We obtain minimax type results over Sobolev classes indexed by a smoothness parameter $s>1/2$ for the mean squared error at a fixed point. In particular, we show that taking the average of private kernel density estimators from $n$ different data owners attains the optimal rate of convergence if the bandwidth parameter is correctly specified. Notably, the optimal convergence rate in terms of the sample size $n$ is $n^{-(2s-1)/(2s+1)}$ under local differential privacy and thus deteriorated to the rate $n^{-(2s-1)/(2s)}$ which holds without privacy restrictions. Since the optimal choice of the bandwidth parameter depends on the smoothness $s$ and is thus not accessible in practice, adaptive methods for bandwidth selection are necessary and must, in the local privacy framework, be performed directly on the anonymized data. We address this problem by means of a variant of Lepski's method tailored to the privacy setup and obtain general oracle inequalities for private kernel density estimators. In the Sobolev case, the resulting adaptive estimator attains the optimal rate of convergence at least up to extra logarithmic factors.

STMar 5, 2019
Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

Cristina Butucea, Amandine Dubois, Martin Kroll et al.

We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $α$-differential privacy and provide a lower bound on the rate of convergence over Besov spaces $B^s_{pq}$ under mean integrated $\mathbb L^r$-risk. This lower bound is deteriorated compared to the standard setup without privacy, and reveals a twofold elbow effect. In order to fulfil the privacy requirement, we suggest adding suitably scaled Laplace noise to empirical wavelet coefficients. Upper bounds within (at most) a logarithmic factor are derived under the assumption that $α$ stays bounded as $n$ increases: A linear but non-adaptive wavelet estimator is shown to attain the lower bound whenever $p \geq r$ but provides a slower rate of convergence otherwise. An adaptive non-linear wavelet estimator with appropriately chosen smoothing parameters and thresholding is shown to attain the lower bound within a logarithmic factor for all cases.

CROct 24, 2014
Automated Cryptanalysis of Bloom Filter Encryptions of Health Records

Martin Kroll, Simone Steinmetzer

Privacy-preserving record linkage with Bloom filters has become increasingly popular in medical applications, since Bloom filters allow for probabilistic linkage of sensitive personal data. However, since evidence indicates that Bloom filters lack sufficiently high security where strong security guarantees are required, several suggestions for their improvement have been made in literature. One of those improvements proposes the storage of several identifiers in one single Bloom filter. In this paper we present an automated cryptanalysis of this Bloom filter variant. The three steps of this procedure constitute our main contributions: (1) a new method for the detection of Bloom filter encrytions of bigrams (so-called atoms), (2) the use of an optimization algorithm for the assignment of atoms to bigrams, (3) the reconstruction of the original attribute values by linkage against bigram sets obtained from lists of frequent attribute values in the underlying population. To sum up, our attack provides the first convincing attack on Bloom filter encryptions of records built from more than one identifier.

CRFeb 13, 2014
A graph theoretic linkage attack on microdata in a metric space

Martin Kroll

Certain methods of analysis require the knowledge of the spatial distances between entities whose data are stored in a microdata table. For instance, such knowledge is necessary and sufficient to perform data mining tasks such as nearest neighbour searches or clustering. However, when inter-record distances are published in addition to the microdata for research purposes, the risk of identity disclosure has to be taken into consideration again. In order to tackle this problem, we introduce a flexible graph model for microdata in a metric space and propose a linkage attack based on realistic assumptions of a data snooper's background knowledge. This attack is based on the idea of finding a maximum approximate common subgraph of two vertex-labelled and edge-weighted graphs. By adapting a standard argument from algorithmic graph theory to our setup, this task is transformed to the maximum clique detection problem in a corresponding product graph. Using a toy example and experimental results on simulated data show that publishing even approximate distances could increase the risk of identity disclosure unreasonably.