CRJul 14, 2023
Smooth Lower Bounds for Differentially Private Algorithms via Padding-and-Permuting Fingerprinting CodesNaty Peter, Eliad Tsfadia, Jonathan Ullman
Fingerprinting arguments, first introduced by Bun, Ullman, and Vadhan (STOC 2014), are the most widely used method for establishing lower bounds on the sample complexity or error of approximately differentially private (DP) algorithms. Still, there are many problems in differential privacy for which we don't know suitable lower bounds, and even for problems that we do, the lower bounds are not smooth, and usually become vacuous when the error is larger than some threshold. We present a new framework and tools to generate smooth lower bounds on the sample complexity of differentially private algorithms satisfying very weak accuracy. We illustrate the applicability of our method by providing new lower bounds in various settings: 1. A tight lower bound for DP averaging in the low-accuracy regime, which in particular implies a lower bound for the private 1-cluster problem introduced by Nissim, Stemmer, and Vadhan (PODS 2016). 2. A lower bound on the additive error of DP algorithms for approximate k-means clustering and general (k,z)-clustering, as a function of the multiplicative error, which is tight for a constant multiplication error. 3. A lower bound for estimating the top singular vector of a matrix under DP in low-accuracy regimes, which is a special case of DP subspace estimation studied by Singhal and Steinke (NeurIPS 2021). Our main technique is to apply a padding-and-permuting transformation to a fingerprinting code. However, rather than proving our results using a black-box access to an existing fingerprinting code (e.g., Tardos' code), we develop a new fingerprinting lemma that is stronger than those of Dwork et al. (FOCS 2015) and Bun et al. (SODA 2017), and prove our lower bounds directly from the lemma. Our lemma, in particular, gives a simpler fingerprinting code construction with optimal rate (up to polylogarithmic factors) that is of independent interest.
LGFeb 10
Computationally Efficient Replicable Learning of ParitiesMoshe Noivirt, Jessica Sorrell, Eliad Tsfadia
We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.
LGFeb 9, 2024
On Differentially Private Subspace Estimation in a Distribution-Free SettingEliad Tsfadia
Private data analysis faces a significant challenge known as the curse of dimensionality, leading to increased costs. However, many datasets possess an inherent low-dimensional structure. For instance, during optimization via gradient descent, the gradients frequently reside near a low-dimensional subspace. If the low-dimensional structure could be privately identified using a small amount of points, we could avoid paying for the high ambient dimension. On the negative side, Dwork, Talwar, Thakurta, and Zhang (STOC 2014) proved that privately estimating subspaces, in general, requires an amount of points that has a polynomial dependency on the dimension. However, their bounds do not rule out the possibility to reduce the number of points for "easy" instances. Yet, providing a measure that captures how much a given dataset is "easy" for this task turns out to be challenging, and was not properly addressed in prior works. Inspired by the work of Singhal and Steinke (NeurIPS 2021), we provide the first measures that quantify "easiness" as a function of multiplicative singular-value gaps in the input dataset, and support them with new upper and lower bounds. In particular, our results determine the first types of gaps that are sufficient and necessary for estimating a subspace with an amount of points that is independent of the dimension. Furthermore, we realize our upper bounds using a practical algorithm and demonstrate its advantage in high-dimensional regimes compared to prior approaches.
LGMay 24, 2023
Adaptive Data Analysis in a Balanced Adversarial ModelKobbi Nissim, Uri Stemmer, Eliad Tsfadia
In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an unknown distribution $D$, and is required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to $D$. Hardt and Ullman (FOCS 2014) and Steinke and Ullman (COLT 2015) showed that in general, it is computationally hard to answer more than $Θ(n^2)$ adaptive queries, assuming the existence of one-way functions. However, these negative results strongly rely on an adversarial model that significantly advantages the adversarial analyst over the mechanism, as the analyst, who chooses the adaptive queries, also chooses the underlying distribution $D$. This imbalance raises questions with respect to the applicability of the obtained hardness results -- an analyst who has complete knowledge of the underlying distribution $D$ would have little need, if at all, to issue statistical queries to a mechanism which only holds a finite number of samples from $D$. We consider more restricted adversaries, called \emph{balanced}, where each such adversary consists of two separated algorithms: The \emph{sampler} who is the entity that chooses the distribution and provides the samples to the mechanism, and the \emph{analyst} who chooses the adaptive queries, but has no prior knowledge of the underlying distribution (and hence has no a priori advantage with respect to the mechanism). We improve the quality of previous lower bounds by revisiting them using an efficient \emph{balanced} adversary, under standard public-key cryptography assumptions. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded \emph{balanced} adversary that has the structure of all known attacks, implies the existence of public-key cryptography.
LGDec 29, 2021
Differentially-Private Clustering of Easy InstancesEdith Cohen, Haim Kaplan, Yishay Mansour et al.
Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points. Despite significant research progress, the problem had so far resisted practical solutions. In this work we aim at providing simple implementable differentially private clustering algorithms that provide utility when the data is "easy," e.g., when there exists a significant separation between the clusters. We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results. We are able to get improved sample complexity bounds in some cases of Gaussian mixtures and $k$-means. We complement our theoretical analysis with an empirical evaluation on synthetic data.
LGOct 19, 2021
FriendlyCore: Practical Differentially Private AggregationEliad Tsfadia, Edith Cohen, Haim Kaplan et al.
Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results. We propose a simple and practical tool, $\mathsf{FriendlyCore}$, that takes a set of points ${\cal D}$ from an unrestricted (pseudo) metric space as input. When ${\cal D}$ has effective diameter $r$, $\mathsf{FriendlyCore}$ returns a "stable" subset ${\cal C} \subseteq {\cal D}$ that includes all points, except possibly few outliers, and is {\em certified} to have diameter $r$. $\mathsf{FriendlyCore}$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy. Surprisingly, $\mathsf{FriendlyCore}$ is light-weight with no dependence on the dimension. We empirically demonstrate its advantages in boosting the accuracy of mean estimation and clustering tasks such as $k$-means and $k$-GMM, outperforming tailored methods.
CRAug 17, 2021
On the Complexity of Two-Party Differential PrivacyIftach Haitner, Noam Mazor, Jad Silbak et al.
In distributed differential privacy, the parties perform analysis over their joint data while preserving the privacy for both datasets. Interestingly, for a few fundamental two-party functions such as inner product and Hamming distance, the accuracy of the distributed solution lags way behind what is achievable in the client-server setting. McGregor, Mironov, Pitassi, Reingold, Talwar, and Vadhan [FOCS '10] proved that this gap is inherent, showing upper bounds on the accuracy of (any) distributed solution for these functions. These limitations can be bypassed when settling for computational differential privacy, where the data is differentially private only in the eyes of a computationally bounded observer, using public-key cryptography primitives. We prove that the use of public-key cryptography is necessary for bypassing the limitation of McGregor et al., showing that a non-trivial solution for the inner-product, or the Hamming distance, implies the existence of a key-agreement protocol. Our bound implies a combinatorial proof for the fact that non-Boolean inner product of independent (strong) Santha-Vazirani sources is a good condenser. We obtain our main result by showing that the inner-product of a (single, strong) SV source with a uniformly random seed is a good condenser, even when the seed and source are dependent.
CRMay 3, 2021
An Almost-Optimally Fair Three-Party Coin-Flipping ProtocolIftach Haitner, Eliad Tsfadia
In a multiparty fair coin-flipping protocol, the parties output a common (close to) unbiased bit, even when some corrupted parties try to bias the output. Cleve [STOC 1986] has shown that in the case of dishonest majority (i.e., at least half of the parties can be corrupted), in any $m$-round coin-flipping protocol the corrupted parties can bias the honest parties' common output bit by $Ω(\frac1{m})$. For more than two decades the best known coin-flipping protocols against dishonest majority had bias $Θ(\frac{\ell}{\sqrt{m}})$, where $\ell$ is the number of corrupted parties. This was changed by a recent breakthrough result of Moran et al. [TCC 2009], who constructed an $m$-round, two-party coin-flipping protocol with optimal bias $Θ(\frac1{m})$. In a subsequent work, Beimel et al. [Crypto 2010] extended this result to the multiparty case in which less than $\frac23$ of the parties can be corrupted. Still for the case of $\frac23$ (or more) corrupted parties, the best known protocol had bias $Θ(\frac{\ell}{\sqrt{m}})$. In particular, this was the state of affairs for the natural three-party case. We make a step towards eliminating the above gap, presenting an $m$-round, three-party coin-flipping protocol, with bias $\frac{O(\log^3 m)}m$. Our approach (which we also apply for the two-party case) does not follow the "threshold round" paradigm used in the work of Moran et al. and Beimel et al., but rather is a variation of the majority protocol of Cleve, used to obtain the aforementioned $Θ(\frac{\ell}{\sqrt{m}})$-bias protocol.
CRMay 3, 2021
A Tight Parallel Repetition Theorem for Partially Simulatable Interactive Arguments via Smooth KL-DivergenceItay Berman, Iftach Haitner, Eliad Tsfadia
Hardness amplification is a central problem in the study of interactive protocols. While ``natural'' parallel repetition transformation is known to reduce the soundness error of some special cases of interactive arguments: three-message protocols and public-coin protocols, it fails to do so in the general case. The only known round-preserving approach that applies to all interactive arguments is Haitner's random-terminating transformation [SICOMP '13], who showed that the parallel repetition of the transformed protocol reduces the soundness error at a weak exponential rate: if the original $m$-round protocol has soundness error $1-p$, then the $n$-parallel repetition of its random-terminating variant has soundness error $(1-p)^{p n / m^4}$ (omitting constant factors). Hastad et al. [TCC '10] have generalized this result to partially simulatable interactive arguments, showing that the $n$-fold repetition of an $m$-round $δ$-simulatable argument of soundness error $1-p$ has soundness error $(1-p)^{p δ^2 n / m^2}$. When applied to random-terminating arguments, the Hastad et al. bound matches that of Haitner. In this work we prove that parallel repetition of random-terminating arguments reduces the soundness error at a much stronger exponential rate: the soundness error of the $n$ parallel repetition is $(1-p)^{n / m}$, only an $m$ factor from the optimal rate of $(1-p)^n$ achievable in public-coin and three-message arguments. The result generalizes to $δ$-simulatable arguments, for which we prove a bound of $(1-p)^{δn / m}$. This is achieved by presenting a tight bound on a relaxed variant of the KL-divergence between the distribution induced by our reduction and its ideal variant, a result whose scope extends beyond parallel repetition proofs. We prove the tightness of the above bound for random-terminating arguments, by presenting a matching protocol.
CRApr 18, 2021
Fair Coin Flipping: Tighter Analysis and the Many-Party CaseNiv Buchbinder, Iftach Haitner, Nissan Levi et al.
In a multi-party fair coin-flipping protocol, the parties output a common (close to) unbiased bit, even when some adversarial parties try to bias the output. In this work we focus on the case of an arbitrary number of corrupted parties. Cleve [STOC 1986] has shown that in any such $m$-round coin-flipping protocol, the corrupted parties can bias the honest parties' common output bit by $Θ(1/m)$. For more than two decades, the best known coin-flipping protocol was the one of Awerbuch et al. [Manuscript 1985], who presented a $t$-party, $m$-round protocol with bias $Θ(t/\sqrt{m})$. This was changed by the breakthrough result of Moran et al. [TCC 2009], who constructed an $m$-round, two-party coin-flipping protocol with optimal bias $Θ(1/m)$. Haitner and Tsfadia [STOC 2014] constructed an $m$-round, three-party coin-flipping protocol with bias $O(\log^3m / m)$. Still for the case of more than three parties, the best known protocol remained the $Θ(t/\sqrt{m})$-bias protocol of Awerbuch et al. We make a step towards eliminating the above gap, presenting a $t$-party, $m$-round coin-flipping protocol, with bias $O(\frac{t^4 \cdot 2^t \cdot \sqrt{\log m}}{m^{1/2+1/\left(2^{t-1}-2\right)}})$ for any $t\le \tfrac12 \log\log m$. This improves upon the $Θ(t/\sqrt{m})$-bias protocol of Awerbuch et al., and in particular, for $t\in O(1)$ it is an $1/m^{\frac12 + Θ(1)}$-bias protocol. For the three-party case, it is an $O(\sqrt{\log m}/m)$-bias protocol, improving over the $O(\log^3m / m)$-bias protocol of Haitner and Tsfadia. Our protocol generalizes that of Haitner and Tsfadia, by presenting an appropriate recovery protocol for the remaining parties to interact in, in the case that some parties abort or are caught cheating. We prove the fairness of the new protocol by presenting a new paradigm for analyzing fairness of coin-flipping protocols.
LGApr 16, 2020
Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample ComplexityHaim Kaplan, Yishay Mansour, Uri Stemmer et al.
We present a differentially private learner for halfspaces over a finite grid $G$ in $\mathbb{R}^d$ with sample complexity $\approx d^{2.5}\cdot 2^{\log^*|G|}$, which improves the state-of-the-art result of [Beimel et al., COLT 2019] by a $d^2$ factor. The building block for our learner is a new differentially private algorithm for approximately solving the linear feasibility problem: Given a feasible collection of $m$ linear constraints of the form $Ax\geq b$, the task is to privately identify a solution $x$ that satisfies most of the constraints. Our algorithm is iterative, where each iteration determines the next coordinate of the constructed solution $x$.
CRAug 8, 2018
It Takes Two to #MeToo - Using Enclaves to Build Autonomous Trusted SystemsDanny Harnik, Paula Ta-Shma, Eliad Tsfadia
We provide enhanced security against insider attacks in services that manage extremely sensitive data. One example is a #MeToo use case where sexual harassment complaints are reported but only revealed when another complaint is filed against the same perpetrator. Such a service places tremendous trust on service operators which our work aims to relieve. To this end we introduce a new autonomous data management concept which transfers responsibility for the sensitive data from administrators to secure and verifiable hardware. The main idea is to manage all data access via a cluster of autonomous computation agents running inside Intel SGX enclaves. These EConfidante agents share a secret data key which is unknown to any external entity, including the data service administrators, thus eliminating many opportunities for data exposure. In this paper we describe a detailed design of the EConfidante system, its flow and how it is managed and implemented. Our #MeToo design also uses an immutable distributed ledger which is built using components from a Blockchain framework. We implemented a proof of concept of our system for the #MeToo use case and analyze its security properties and implementation details.
CRJun 28, 2018
Securing the Storage Data Path with SGX EnclavesDanny Harnik, Eliad Tsfadia, Doron Chen et al.
We explore the use of SGX enclaves as a means to improve the security of handling keys and data in storage systems. We study two main configurations for SGX computations, as they apply to performing data-at-rest encryption in a storage system. The first configuration aims to protect the encryption keys used in the encryption process. The second configuration aims to protect both the encryption keys and the data, thus providing end-to-end security of the entire data path. Our main contribution is an evaluation of the viability of SGX for data-at-rest encryption from a performance perspective and an understanding of the details that go into using enclaves in a performance sensitive environment. Our tests paint a complex picture: On the one hand SGX can indeed achieve high encryption and decryption throughput, comparable to running without SGX. On the other hand, there are many subtleties to achieving such performance and careful design choices and testing are required.