74.1DSMar 30
Differentially Private Learning of Exponential Distributions: Simple Algorithms and Tight BoundsBar Mahpud, Or Sheffet
We study the problem of learning exponential distributions under differential privacy. Given $n$ i.i.d.\ samples from $\mathrm{Exp}(λ)$, the goal is to privately estimate $λ$ so that the learned distribution is close in total variation distance to the truth. We present a simple pure $ε$-differentially private algorithm that avoids the classical dependence on the true value of $λ$. Our method leverages a structural property of the exponential distribution: its $(1-1/e)$-quantile equals $1/λ$, allowing us to estimate the rate parameter directly via private quantile estimation. The resulting learner is both conceptually simple and sample-efficient, achieving near-optimal guarantees. We further extend the method to Pareto distributions via a logarithmic reduction, prove nearly matching lower bounds using group privacy arguments, and show how approximate $(ε,δ)$-DP removes the need for externally supplied parameter bounds. Together, these results give the first tight characterization of exponential distribution learning under differential privacy using a simple $λ$-free approach.
LGMay 21, 2025
Privacy-Preserving Conformal Prediction Under Local Differential PrivacyCoby Penso, Bar Mahpud, Jacob Goldberger et al.
Conformal prediction (CP) provides sets of candidate classes with a guaranteed probability of containing the true class. However, it typically relies on a calibration set with clean labels. We address privacy-sensitive scenarios where the aggregator is untrusted and can only access a perturbed version of the true labels. We propose two complementary approaches under local differential privacy (LDP). In the first approach, users do not access the model but instead provide their input features and a perturbed label using a k-ary randomized response. In the second approach, which enforces stricter privacy constraints, users add noise to their conformity score by binary search response. This method requires access to the classification model but preserves both data and label privacy. Both approaches compute the conformal threshold directly from noisy data without accessing the true labels. We prove finite-sample coverage guarantees and demonstrate robust coverage even under severe randomization. This approach unifies strong local privacy with predictive uncertainty control, making it well-suited for sensitive applications such as medical imaging or large language model queries, regardless of whether users can (or are willing to) compute their own scores.
LGMay 20, 2025
A Private Approximation of the 2nd-Moment Matrix of Any Subsamplable InputBar Mahpud, Or Sheffet
We study the problem of differentially private second moment estimation and present a new algorithm that achieve strong privacy-utility trade-offs even for worst-case inputs under subsamplability assumptions on the data. We call an input $(m,α,β)$-subsamplable if a random subsample of size $m$ (or larger) preserves w.p $\geq 1-β$ the spectral structure of the original second moment matrix up to a multiplicative factor of $1\pm α$. Building upon subsamplability, we give a recursive algorithmic framework similar to Kamath et al 2019, that abides zero-Concentrated Differential Privacy (zCDP) while preserving w.h.p. the accuracy of the second moment estimation upto an arbitrary factor of $(1\pmγ)$. We then show how to apply our algorithm to approximate the second moment matrix of a distribution $\mathcal{D}$, even when a noticeable fraction of the input are outliers.
LGJan 28, 2022
Transfer Learning In Differential Privacy's Hybrid-ModelRefael Kohen, Or Sheffet
The hybrid-model (Avent et al 2017) in Differential Privacy is a an augmentation of the local-model where in addition to N local-agents we are assisted by one special agent who is in fact a curator holding the sensitive details of n additional individuals. Here we study the problem of machine learning in the hybrid-model where the n individuals in the curators dataset are drawn from a different distribution than the one of the general population (the local-agents). We give a general scheme -- Subsample-Test-Reweigh -- for this transfer learning problem, which reduces any curator-model DP-learner to a hybrid-model learner in this setting using iterative subsampling and reweighing of the n examples held by the curator based on a smooth variation of the Multiplicative-Weights algorithm (introduced by Bun et al, 2020). Our scheme has a sample complexity which relies on the chi-squared divergence between the two distributions. We give worst-case analysis bounds on the sample complexity required for our private reduction. Aiming to reduce said sample complexity, we give two specific instances our sample complexity can be drastically reduced (one instance is analyzed mathematically, while the other - empirically) and pose several directions for follow-up work.
DSJul 16, 2020
Private Approximations of a Convex Hull in Low DimensionsYue Gao, Or Sheffet
We give the first differentially private algorithms that estimate a variety of geometric features of points in the Euclidean space, such as diameter, width, volume of convex hull, min-bounding box, min-enclosing ball etc. Our work relies heavily on the notion of \emph{Tukey-depth}. Instead of (non-privately) approximating the convex-hull of the given set of points $P$, our algorithms approximate the geometric features of the $κ$-Tukey region induced by $P$ (all points of Tukey-depth $κ$ or greater). Moreover, our approximations are all bi-criteria: for any geometric feature $μ$ our $(α,Δ)$-approximation is a value "sandwiched" between $(1-α)μ(D_P(κ))$ and $(1+α)μ(D_P(κ-Δ))$. Our work is aimed at producing a \emph{$(α,Δ)$-kernel of $D_P(κ)$}, namely a set $\mathcal{S}$ such that (after a shift) it holds that $(1-α)D_P(κ)\subset \mathsf{CH}(\mathcal{S}) \subset (1+α)D_P(κ-Δ)$. We show that an analogous notion of a bi-critera approximation of a directional kernel, as originally proposed by Agarwal et al~[2004], \emph{fails} to give a kernel, and so we result to subtler notions of approximations of projections that do yield a kernel. First, we give differentially private algorithms that find $(α,Δ)$-kernels for a "fat" Tukey-region. Then, based on a private approximation of the min-bounding box, we find a transformation that does turn $D_P(κ)$ into a "fat" region \emph{but only if} its volume is proportional to the volume of $D_P(κ-Δ)$. Lastly, we give a novel private algorithm that finds a depth parameter $κ$ for which the volume of $D_P(κ)$ is comparable to $D_P(κ-Δ)$. We hope this work leads to the further study of the intersection of differential privacy and computational geometry.
MLJun 11, 2020
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private SchemeKontantinos E. Nikolakakis, Dionysios S. Kalogerias, Or Sheffet et al.
We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is $δ$-PAC and we characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem, as we show when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support-size, and we characterize its sample complexity. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand.
DSDec 18, 2019
The power of synergy in differential privacy: Combining a small curator with local randomizersAmos Beimel, Aleksandra Korolova, Kobbi Nissim et al.
Motivated by the desire to bridge the utility gap between local and trusted curator models of differential privacy for practical applications, we initiate the theoretical study of a hybrid model introduced by "Blender" [Avent et al.,\ USENIX Security '17], in which differentially private protocols of n agents that work in the local-model are assisted by a differentially private curator that has access to the data of m additional users. We focus on the regime where m << n and study the new capabilities of this (m,n)-hybrid model. We show that, despite the fact that the hybrid model adds no significant new capabilities for the basic task of simple hypothesis-testing, there are many other tasks (under a wide range of parameters) that can be solved in the hybrid model yet cannot be solved either by the curator or by the local-users separately. Moreover, we exhibit additional tasks where at least one round of interaction between the curator and the local-users is necessary -- namely, no hybrid model protocol without such interaction can solve these tasks. Taken together, our results show that the combination of the local model with a small curator can become part of a promising toolkit for designing and implementing differential privacy.
DSSep 9, 2019
Differentially Private Algorithms for Learning Mixtures of Separated GaussiansGautam Kamath, Or Sheffet, Vikrant Singhal et al.
Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications. In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy. In particular, we give a differentially private analogue of the algorithm of Achlioptas and McSherry. Our algorithm has two key properties not achieved by prior work: (1) The algorithm's sample complexity matches that of the corresponding non-private algorithm up to lower order terms in a wide range of parameters. (2) The algorithm does not require strong a priori bounds on the parameters of the mixture components.
MLMay 22, 2019
An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping RuleTouqir Sajed, Or Sheffet
We present a provably optimal differentially private algorithm for the stochastic multi-arm bandit problem, as opposed to the private analogue of the UCB-algorithm [Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016] which doesn't meet the recently discovered lower-bound of $Ω\left(\frac{K\log(T)}ε \right)$ [Shariff and Sheffet, 2018]. Our construction is based on a different algorithm, Successive Elimination [Even-Dar et al. 2002], that repeatedly pulls all remaining arms until an arm is found to be suboptimal and is then eliminated. In order to devise a private analogue of Successive Elimination we visit the problem of private stopping rule, that takes as input a stream of i.i.d samples from an unknown distribution and returns a multiplicative $(1 \pm α)$-approximation of the distribution's mean, and prove the optimality of our private stopping rule. We then present the private Successive Elimination algorithm which meets both the non-private lower bound [Lai and Robbins, 1985] and the above-mentioned private lower bound. We also compare empirically the performance of our algorithm with the private UCB algorithm.
LGSep 28, 2018
Differentially Private Contextual Linear BanditsRoshan Shariff, Or Sheffet
We study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. Though the context is chosen arbitrarily or adversarially, the reward is assumed to be a stochastic function of a feature vector that encodes the context and selected action. Our goal is to devise private learners for the contextual linear bandit problem. We first show that using the standard definition of differential privacy results in linear regret. So instead, we adopt the notion of joint differential privacy, where we assume that the action chosen on day $t$ is only revealed to user $t$ and thus needn't be kept private that day, only on following days. We give a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm. We then apply either Gaussian noise or Wishart noise to achieve joint-differentially private algorithms and bound the resulting algorithms' regrets. In addition, we give the first lower bound on the additional regret any private algorithms for the MAB problem must incur.
CRFeb 9, 2018
Locally Private Hypothesis TestingOr Sheffet
We initiate the study of differentially private hypothesis testing in the local-model, under both the standard (symmetric) randomized-response mechanism (Warner, 1965, Kasiviswanathan et al, 2008) and the newer (non-symmetric) mechanisms (Bassily and Smith, 2015, Bassily et al, 2017). First, we study the general framework of mapping each user's type into a signal and show that the problem of finding the maximum-likelihood distribution over the signals is feasible. Then we discuss the randomized-response mechanism and show that, in essence, it maps the null- and alternative-hypotheses onto new sets, an affine translation of the original sets. We then give sample complexity bounds for identity and independence testing under randomized-response. We then move to the newer non-symmetric mechanisms and show that there too the problem of finding the maximum-likelihood distribution is feasible. Under the mechanism of Bassily et al (2007) we give identity and independence testers with better sample complexity than the testers in the symmetric case, and we also propose a $χ^2$-based identity tester which we investigate empirically.
DSJul 9, 2015
Differentially Private Ordinary Least SquaresOr Sheffet
Linear regression is one of the most prevalent techniques in machine learning, however, it is also common to use linear regression for its \emph{explanatory} capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives \emph{$t$-values} --- representing the likelihood of each real value to be the true correlation. Using $t$-values, OLS can release a \emph{confidence interval}, which is an interval on the reals that is likely to contain the true correlation, and when this interval does not intersect the origin, we can \emph{reject the null hypothesis} as it is likely that the true correlation is non-zero. Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives a very good approximation of $t$-values, secondly, when JLT approximates Ridge regression (linear regression with $l_2$-regularization) we derive, under certain conditions, confidence intervals using the projected data, lastly, we derive, under different conditions, confidence intervals for the "Analyze Gauss" algorithm (Dwork et al, STOC 2014).
LGOct 31, 2014
Learning Mixtures of Ranking ModelsPranjal Awasthi, Avrim Blum, Or Sheffet et al.
This work concerns learning probabilistic models for ranking data in a heterogeneous population. The specific problem we study is learning the parameters of a Mallows Mixture Model. Despite being widely studied, current heuristics for this problem do not have theoretical guarantees and can get stuck in bad local optima. We present the first polynomial time algorithm which provably learns the parameters of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the top-k prefix in both the rankings. Before this work, even the question of identifiability in the case of a mixture of two Mallows models was unresolved.
CRFeb 20, 2013
Optimizing Password Composition PoliciesJeremiah Blocki, Saranga Komanduri, Ariel Procaccia et al.
A password composition policy restricts the space of allowable passwords to eliminate weak passwords that are vulnerable to statistical guessing attacks. Usability studies have demonstrated that existing password composition policies can sometimes result in weaker password distributions; hence a more principled approach is needed. We introduce the first theoretical model for optimizing password composition policies. We study the computational and sample complexity of this problem under different assumptions on the structure of policies and on users' preferences over passwords. Our main positive result is an algorithm that -- with high probability --- constructs almost optimal policies (which are specified as a union of subsets of allowed passwords), and requires only a small number of samples of users' preferred passwords. We complement our theoretical results with simulations using a real-world dataset of 32 million passwords.
CRAug 22, 2012
Differentially Private Data Analysis of Social Networks via Restricted SensitivityJeremiah Blocki, Avrim Blum, Anupam Datta et al.
We introduce the notion of restricted sensitivity as an alternative to global and smooth sensitivity to improve accuracy in differentially private data analysis. The definition of restricted sensitivity is similar to that of global sensitivity except that instead of quantifying over all possible datasets, we take advantage of any beliefs about the dataset that a querier may have, to quantify over a restricted class of datasets. Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f_H whose global sensitivity (over all datasets including those that do not satisfy H) matches the restricted sensitivity of the query f. Moreover, if the belief of the querier is correct (i.e., D is in H) then f_H(D) = f(D). If the belief is incorrect, then f_H(D) may be inaccurate. We demonstrate the usefulness of this notion by considering the task of answering queries regarding social-networks, which we model as a combination of a graph and a labeling of its vertices. In particular, while our generic procedure is computationally inefficient, for the specific definition of H as graphs of bounded degree, we exhibit efficient ways of constructing f_H using different projection-based techniques. We then analyze two important query classes: subgraph counting queries (e.g., number of triangles) and local profile queries (e.g., number of people who know a spy and a computer-scientist who know each other). We demonstrate that the restricted sensitivity of such queries can be significantly lower than their smooth sensitivity. Thus, using restricted sensitivity we can maintain privacy whether or not D is in H, while providing more accurate results in the event that H holds true.
LGJun 27, 2012
Predicting Preference Flips in Commerce SearchOr Sheffet, Nina Mishra, Samuel Ieong
Traditional approaches to ranking in web search follow the paradigm of rank-by-score: a learned function gives each query-URL combination an absolute score and URLs are ranked according to this score. This paradigm ensures that if the score of one URL is better than another then one will always be ranked higher than the other. Scoring contradicts prior work in behavioral economics that showed that users' preferences between two items depend not only on the items but also on the presented alternatives. Thus, for the same query, users' preference between items A and B depends on the presence/absence of item C. We propose a new model of ranking, the Random Shopper Model, that allows and explains such behavior. In this model, each feature is viewed as a Markov chain over the items to be ranked, and the goal is to find a weighting of the features that best reflects their importance. We show that our model can be learned under the empirical risk minimization framework, and give an efficient learning algorithm. Experiments on commerce search logs demonstrate that our algorithm outperforms scoring-based approaches including regression and listwise ranking.
LGJun 14, 2012
Improved Spectral-Norm Bounds for ClusteringPranjal Awasthi, Or Sheffet
Aiming to unify known results about clustering mixtures of distributions under separation conditions, Kumar and Kannan[2010] introduced a deterministic condition for clustering datasets. They showed that this single deterministic condition encompasses many previously studied clustering assumptions. More specifically, their proximity condition requires that in the target $k$-clustering, the projection of a point $x$ onto the line joining its cluster center $μ$ and some other center $μ'$, is a large additive factor closer to $μ$ than to $μ'$. This additive factor can be roughly described as $k$ times the spectral norm of the matrix representing the differences between the given (known) dataset and the means of the (unknown) target clustering. Clearly, the proximity condition implies center separation -- the distance between any two centers must be as large as the above mentioned bound. In this paper we improve upon the work of Kumar and Kannan along several axes. First, we weaken the center separation bound by a factor of $\sqrt{k}$, and secondly we weaken the proximity condition by a factor of $k$. Using these weaker bounds we still achieve the same guarantees when all points satisfy the proximity condition. We also achieve better guarantees when only $(1-ε)$-fraction of the points satisfy the weaker proximity condition. The bulk of our analysis relies only on center separation under which one can produce a clustering which (i) has low error, (ii) has low $k$-means cost, and (iii) has centers very close to the target centers. Our improved separation condition allows us to match the results of the Planted Partition Model of McSherry[2001], improve upon the results of Ostrovsky et al[2006], and improve separation results for mixture of Gaussian models in a particular setting.