MLSep 21, 2022
Shape And Structure Preserving Differential PrivacyCarlos Soto, Karthik Bharath, Matthew Reimherr et al.
It is common for data structures such as images and shapes of 2D objects to be represented as points on a manifold. The utility of a mechanism to produce sanitized differentially private estimates from such data is intimately linked to how compatible it is with the underlying structure and geometry of the space. In particular, as recently shown, utility of the Laplace mechanism on a positively curved manifold, such as Kendall's 2D shape space, is significantly influences by the curvature. Focusing on the problem of sanitizing the Fréchet mean of a sample of points on a manifold, we exploit the characterisation of the mean as the minimizer of an objective function comprised of the sum of squared distances and develop a K-norm gradient mechanism on Riemannian manifolds that favors values that produce gradients close to the the zero of the objective function. For the case of positively curved manifolds, we describe how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism, and demonstrate this numerically on a dataset of shapes of corpus callosa. Further illustrations of the mechanism's utility on a sphere and the manifold of symmetric positive definite matrices are also presented.
CRSep 10, 2024
Gaussian Differentially Private Human Faces Under a Face Radial Curve RepresentationCarlos Soto, Matthew Reimherr, Aleksandra Slavkovic et al.
In this paper we consider the problem of releasing a Gaussian Differentially Private (GDP) 3D human face. The human face is a complex structure with many features and inherently tied to one's identity. Protecting this data, in a formally private way, is important yet challenging given the dimensionality of the problem. We extend approximate DP techniques for functional data to the GDP framework. We further propose a novel representation, face radial curves, of a 3D face as a set of functions and then utilize our proposed GDP functional data mechanism. To preserve the shape of the face while injecting noise we rely on tools from shape analysis for our novel representation of the face. We show that our method preserves the shape of the average face and injects less noise than traditional methods for the same privacy budget. Our mechanism consists of two primary components, the first is generally applicable to function value summaries (as are commonly found in nonparametric statistics or functional data analysis) while the second is general to disk-like surfaces and hence more applicable than just to human faces.
MLMay 17, 2022
Perfect Spectral Clustering with Discrete CovariatesJonathan Hehir, Xiaoyue Niu, Aleksandra Slavkovic
Among community detection methods, spectral clustering enjoys two desirable properties: computational efficiency and theoretical guarantees of consistency. Most studies of spectral clustering consider only the edges of a network as input to the algorithm. Here we consider the problem of performing community detection in the presence of discrete node covariates, where network structure is determined by a combination of a latent block model structure and homophily on the observed covariates. We propose a spectral algorithm that we prove achieves perfect clustering with high probability on a class of large, sparse networks with discrete covariates, effectively separating latent network structure from homophily on observed covariates. To our knowledge, our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering in the setting where edge formation is dependent on both latent and observed factors.
CRAug 5, 2021
Perturbed M-Estimation: A Further Investigation of Robust Statistics for Differential PrivacyAleksandra Slavkovic, Roberto Molinari
Differential Privacy (DP) provides an elegant mathematical framework for defining a provable disclosure risk in the presence of arbitrary adversaries; it guarantees that whether an individual is in a database or not, the results of a DP procedure should be similar in terms of their probability distribution. While DP mechanisms are provably effective in protecting privacy, they often negatively impact the utility of the query responses, statistics and/or analyses that come as outputs from these mechanisms. To address this problem, we use ideas from the area of robust statistics which aims at reducing the influence of outlying observations on statistical inference. Based on the preliminary known links between differential privacy and robust statistics, we modify the objective perturbation mechanism by making use of a new bounded function and define a bounded M-Estimator with adequate statistical properties. The resulting privacy mechanism, named "Perturbed M-Estimation", shows important potential in terms of improved statistical utility of its outputs as suggested by some preliminary results. These results consequently support the need to further investigate the use of robust statistical tools for differential privacy.
STMay 26, 2021
Consistent Spectral Clustering of Network Block Models under Local Differential PrivacyJonathan Hehir, Aleksandra Slavkovic, Xiaoyue Niu
The stochastic block model (SBM) and degree-corrected block model (DCBM) are network models often selected as the fundamental setting in which to analyze the theoretical properties of community detection methods. We consider the problem of spectral clustering of SBM and DCBM networks under a local form of edge differential privacy. Using a randomized response privacy mechanism called the edge-flip mechanism, we develop theoretical guarantees for differentially private community detection, demonstrating conditions under which this strong privacy guarantee can be upheld while achieving spectral clustering convergence rates that match the known rates without privacy. We prove the strongest theoretical results are achievable for dense networks (those with node degree linear in the number of nodes), while weak consistency is achievable under mild sparsity (node degree greater than $\sqrt{n}$). We empirically demonstrate our results on a number of network examples.
STMar 31, 2019
Differentially Private Inference for Binomial DataJordan Awan, Aleksandra Slavkovic
We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests can be written in terms of linear constraints, and for exchangeable data can always be expressed as a function of the empirical distribution. Using this structure, we prove a 'Neyman-Pearson lemma' for binomial data under DP, where the DP-UMP only depends on the sample sum. Our tests can also be stated as a post-processing of a random variable, whose distribution we coin ''Truncated-Uniform-Laplace'' (Tulap), a generalization of the Staircase and discrete Laplace distributions. Furthermore, we obtain exact $p$-values, which are easily computed in terms of the Tulap random variable. Using the above techniques, we show that our tests can be applied to give uniformly most accurate one-sided confidence intervals and optimal confidence distributions. We also derive uniformly most powerful unbiased (UMPU) two-sided tests, which lead to uniformly most accurate unbiased (UMAU) two-sided confidence intervals. We show that our results can be applied to distribution-free hypothesis tests for continuous data. Our simulation results demonstrate that all our tests have exact type I error, and are more powerful than current techniques.