LGFeb 15, 2023
Tight Auditing of Differentially Private Machine LearningMilad Nasr, Jamie Hayes, Thomas Steinke et al. · eth-zurich
Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implausible worst-case assumptions (e.g., a fully adversarial dataset). Second, they require thousands or millions of training runs to produce non-trivial statistical estimates of the privacy leakage. This work addresses both issues. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets -- if the adversary can see all model updates during training. Prior auditing works rely on the same assumption, which is permitted under the standard differential privacy threat model. This threat model is also applicable, e.g., in federated learning settings. Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy. We demonstrate the utility of our improved auditing schemes by surfacing implementation bugs in private machine learning code that eluded prior auditing techniques.
LGOct 10, 2023
Correlated Noise Provably Beats Independent Noise for Differentially Private LearningChristopher A. Choquette-Choo, Krishnamurthy Dvijotham, Krishna Pillutla et al. · deepmind
Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choice of the correlation function, giving precise analytical bounds for linear regression and as the solution to a convex program for general convex functions. We show, using these bounds, how correlated noise provably improves upon vanilla DP-SGD as a function of problem parameters such as the effective dimension and condition number. Moreover, our analytical expression for the near-optimal correlation function circumvents the cubic complexity of the semi-definite program used to optimize the noise correlation matrix in previous work. We validate our theory with experiments on private deep learning. Our work matches or outperforms prior work while being efficient both in terms of compute and memory.
LGOct 24, 2023
Privacy Amplification for Matrix MechanismsChristopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke et al. · deepmind
Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $ε\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.
CROct 2, 2022
Composition of Differential Privacy & Privacy Amplification by SubsamplingThomas Steinke
This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications." We give an introduction to the most important property of differential privacy -- composition: running multiple independent analyses on the data of a set of people will still be differentially private as long as each of the analyses is private on its own -- as well as the related topic of privacy amplification by subsampling. This chapter introduces the basic concepts and gives proofs of the key results needed to apply these tools in practice.
LGFeb 19, 2023
Why Is Public Pretraining Necessary for Private Model Training?Arun Ganesh, Mahdi Haghifam, Milad Nasr et al.
In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported with the use of pretraining on publicly available data. This is in part due to the benefits of transfer learning, which is the standard motivation for pretraining in non-private settings. However, the stark contrast in the improvement achieved through pretraining under privacy compared to non-private settings suggests that there may be a deeper, distinct cause driving these gains. To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates an optimization algorithm to go through two phases. In the first, the algorithm needs to select a good "basin" in the loss landscape. In the second, the algorithm solves an easy optimization within that basin. The former is a harder problem to solve with private data, while the latter is harder to solve with public data due to a distribution shift or data scarcity. Guided by this intuition, we provide theoretical constructions that provably demonstrate the separation between private training with and without public pretraining. Further, systematic experiments on CIFAR10 and LibriSpeech provide supporting evidence for our hypothesis.
CRSep 8, 2022
Algorithms with More Granular Differential Privacy GuaranteesBadih Ghazi, Ravi Kumar, Pasin Manurangsi et al.
Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed. In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this framework, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person (i.e., all the attributes).
NAJun 2, 2010
A Rigorous Extension of the Schönhage-Strassen Integer Multiplication Algorithm Using Complex Interval ArithmeticThomas Steinke, Raazesh Sainudiin
Multiplication of n-digit integers by long multiplication requires O(n^2) operations and can be time-consuming. In 1970 A. Schoenhage and V. Strassen published an algorithm capable of performing the task with only O(n log(n)) arithmetic operations over the complex field C; naturally, finite-precision approximations to C are used and rounding errors need to be accounted for. Overall, using variable-precision fixed-point numbers, this results in an O(n(log(n))^(2+Epsilon))-time algorithm. However, to make this algorithm more efficient and practical we need to make use of hardware-based floating-point numbers. How do we deal with rounding errors? and how do we determine the limits of the fixed-precision hardware? Our solution is to use interval arithmetic to guarantee the correctness of results and determine the hardware's limits. We examine the feasibility of this approach and are able to report that 75,000-digit base-256 integers can be handled using double-precision containment sets. This clearly demonstrates that our approach has practical potential; however, at this stage, our implementation does not yet compete with commercial ones, but we are able to demonstrate the feasibility of this technique.
89.9CRMay 14
Adapting AlphaEvolve to Optimize Fully Homomorphic Encryption on TPUsShruthi Gorantala, Jianming Tong, Asra Ali et al.
The deployment of Fully Homomorphic Encryption (FHE) at scale is hindered due to its heavy computational overhead. While specialized hardware accelerators like Google Tensor Processing Units (TPUs) can help, mapping complex cryptographic kernels onto such architectures remains a challenge. Efficient execution requires co-optimization between the systolic array-based Matrix Multiplication Unit (MXU) and Vector Processing Units (VPUs), as well as the orchestration of data movement across the vector register files. Existing compiler stacks often abstract low-level hardware utilization, requiring developers to adopt a manual trial-and-error process that often results in fragmented execution and underutilized resources. To accelerate this development process, we use AlphaEvolve to automate the exploration of hardware-aware cryptographic-kernel optimizations. We frame optimization as an evolutionary search problem, utilizing the closed-loop system provided by AlphaEvolve, that leverages LLM-driven code generation. We use real-world feedback from hardware execution and rigorous correctness testing to guide the evolution process. We evaluate AlphaEvolve optimization on primitives for both the TFHE (Jaxite) and CKKS (CROSS) FHE schemes on Google Cloud TPUv5e, a contemporary TPU architecture. Within 24 hours of automated exploration, AlphaEvolve discovered implementation-level optimizations that improve TFHE bootstrap latency by 2.5x and CKKS rotation and multiplication latency by 1.31x and 1.18x, respectively, relative to human-engineered state of the art. These results demonstrate that AlphaEvolve can be used to enable researchers to navigate the optimization trade-offs between cryptography, compilers, and hardware accelerators.
LGFeb 24, 2022Code
Debugging Differential Privacy: A Case Study for Privacy AuditingFlorian Tramer, Andreas Terzis, Thomas Steinke et al.
Differential Privacy can provide provable privacy guarantees for training data in machine learning. However, the presence of proofs does not preclude the presence of errors. Inspired by recent advances in auditing which have been used for estimating lower bounds on differentially private algorithms, here we show that auditing can also be used to find flaws in (purportedly) differentially private schemes. In this case study, we audit a recent open source implementation of a differentially private deep learning algorithm and find, with 99.99999999% confidence, that the implementation does not satisfy the claimed differential privacy guarantee.
DSApr 25, 2024
Efficient and Near-Optimal Noise Generation for Streaming Differential PrivacyKrishnamurthy Dvijotham, H. Brendan McMahan, Krishna Pillutla et al.
In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differentially private continual counting are either inefficient in terms of their space usage or add an excessive amount of noise, inducing suboptimal utility. The most practical DP continual counting algorithms add carefully correlated Gaussian noise to the values. The task of choosing the covariance for this noise can be expressed in terms of factoring the lower-triangular matrix of ones (which computes prefix sums). We present two approaches from this class (for different parameter regimes) that achieve near-optimal utility for DP continual counting and only require logarithmic or polylogarithmic space (and time). Our first approach is based on a space-efficient streaming matrix multiplication algorithm for a class of Toeplitz matrices. We show that to instantiate this algorithm for DP continual counting, it is sufficient to find a low-degree rational function that approximates the square root on a circle in the complex plane. We then apply and extend tools from approximation theory to achieve this. We also derive efficient closed-forms for the objective function for arbitrarily many steps, and show direct numerical optimization yields a highly practical solution to the problem. Our second approach combines our first approach with a recursive construction similar to the binary tree mechanism.
LGJun 9, 2025
Correlated Noise Mechanisms for Differentially Private LearningKrishna Pillutla, Jalaj Upadhyay, Christopher A. Choquette-Choo et al.
This monograph explores the design and analysis of correlated noise mechanisms for differential privacy (DP), focusing on their application to private training of AI and machine learning models via the core primitive of estimation of weighted prefix sums. While typical DP mechanisms inject independent noise into each step of a stochastic gradient (SGD) learning algorithm in order to protect the privacy of the training data, a growing body of recent research demonstrates that introducing (anti-)correlations in the noise can significantly improve privacy-utility trade-offs by carefully canceling out some of the noise added on earlier steps in subsequent steps. Such correlated noise mechanisms, known variously as matrix mechanisms, factorization mechanisms, and DP-Follow-the-Regularized-Leader (DP-FTRL) when applied to learning algorithms, have also been influential in practice, with industrial deployment at a global scale.
CRSep 30, 2025
Privately Estimating Black-Box StatisticsGünter F. Steinke, Thomas Steinke
Standard techniques for differentially private estimation, such as Laplace or Gaussian noise addition, require guaranteed bounds on the sensitivity of the estimator in question. But such sensitivity bounds are often large or simply unknown. Thus we seek differentially private methods that can be applied to arbitrary black-box functions. A handful of such techniques exist, but all are either inefficient in their use of data or require evaluating the function on exponentially many inputs. In this work we present a scheme that trades off between statistical efficiency (i.e., how much data is needed) and oracle efficiency (i.e., the number of evaluations). We also present lower bounds showing the near-optimality of our scheme.
LGJun 11, 2024
Private Geometric MedianMahdi Haghifam, Thomas Steinke, Jonathan Ullman
In this paper, we study differentially private (DP) algorithms for computing the geometric median (GM) of a dataset: Given $n$ points, $x_1,\dots,x_n$ in $\mathbb{R}^d$, the goal is to find a point $θ$ that minimizes the sum of the Euclidean distances to these points, i.e., $\sum_{i=1}^{n} \|θ- x_i\|_2$. Off-the-shelf methods, such as DP-GD, require strong a priori knowledge locating the data within a ball of radius $R$, and the excess risk of the algorithm depends linearly on $R$. In this paper, we ask: can we design an efficient and private algorithm with an excess error guarantee that scales with the (unknown) radius containing the majority of the datapoints? Our main contribution is a pair of polynomial-time DP algorithms for the task of private GM with an excess error guarantee that scales with the effective diameter of the datapoints. Additionally, we propose an inefficient algorithm based on the inverse smooth sensitivity mechanism, which satisfies the more restrictive notion of pure DP. We complement our results with a lower bound and demonstrate the optimality of our polynomial-time algorithms in terms of sample complexity.
DSMay 22, 2023
Differentially Private Medians and Interior Points for Non-Pathological DataMaryam Aliakbarpour, Rose Silver, Thomas Steinke et al.
We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over $\mathbb{R}$ satisfying very mild moment conditions. Our result stands in contrast to the surprising negative result of Bun et al. (FOCS 2015) that showed there is no differentially private estimator with any finite sample complexity that returns any non-trivial approximation to the median of an arbitrary distribution.
LGMay 22, 2023
Faster Differentially Private Convex Optimization via Second-Order MethodsArun Ganesh, Mahdi Haghifam, Thomas Steinke et al.
Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate DP convex optimization. We first develop a private variant of the regularized cubic Newton method of Nesterov and Polyak, and show that for the class of strongly convex loss functions, our algorithm has quadratic convergence and achieves the optimal excess loss. We then design a practical second-order DP algorithm for the unconstrained logistic regression problem. We theoretically and empirically study the performance of our algorithm. Empirical results show our algorithm consistently achieves the best excess loss compared to other baselines and is 10-40x faster than DP-GD/DP-SGD.
LGMay 15, 2023
Privacy Auditing with One (1) Training RunThomas Steinke, Milad Nasr, Matthew Jagielski
We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions about the algorithm and can be applied in the black-box or white-box setting.
LGDec 1, 2021
Public Data-Assisted Mirror Descent for Private Model TrainingEhsan Amid, Arun Ganesh, Rajiv Mathews et al.
In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by the public data as the mirror map. We show that, for linear regression with feature vectors drawn from a non-isotropic sub-Gaussian distribution, our algorithm, PDA-DPMD (a variant of mirror descent), provides population risk guarantees that are asymptotically better than the best known guarantees under DP (without having access to public data), when the number of public data samples ($n_{\sf pub}$) is sufficiently large. We further show that our algorithm has natural "noise stability" properties that control the variance due to noise added to ensure DP. We demonstrate the efficacy of our algorithm by showing privacy/utility trade-offs on four benchmark datasets (StackOverflow, WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but to our knowledge is the first to improve over DP-SGD on models that have been pre-trained with public data.
MLNov 8, 2021
A Private and Computationally-Efficient Estimator for Unbounded GaussiansGautam Kamath, Argyris Mouzakis, Vikrant Singhal et al.
We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is a new differentially private preconditioner that takes samples from an arbitrary Gaussian $\mathcal{N}(0,Σ)$ and returns a matrix $A$ such that $A ΣA^T$ has constant condition number.
LGOct 7, 2021
Hyperparameter Tuning with Renyi Differential PrivacyNicolas Papernot, Thomas Steinke
For many differentially private algorithms, such as the prominent noisy stochastic gradient descent (DP-SGD), the analysis needed to bound the privacy leakage of a single training run is well understood. However, few studies have reasoned about the privacy leakage resulting from the multiple training runs needed to fine tune the value of the training algorithm's hyperparameters. In this work, we first illustrate how simply setting hyperparameters based on non-private training runs can leak private information. Motivated by this observation, we then provide privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. Our results improve and extend the work of Liu and Talwar (STOC 2019). Our analysis supports our previous observation that tuning hyperparameters does indeed leak private information, but we prove that, under certain assumptions, this leakage is modest, as long as each candidate training run needed to select hyperparameters is itself differentially private.
LGJun 17, 2021
PAC-Bayes, MAC-Bayes and Conditional Mutual Information: Fast rate bounds that handle general VC classesPeter Grünwald, Thomas Steinke, Lydia Zakynthinou
We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and ghost sample. This allows us to get nontrivial PAC-Bayes and MI-style bounds for general VC classes, something recently shown to be impossible with standard PAC-Bayesian/MI bounds. Second, it allows us to get faster rates of order $O \left(({\text{KL}}/n)^γ\right)$ for $γ> 1/2$ if a Bernstein condition holds and for exp-concave losses (with $γ=1$), which is impossible with both standard PAC-Bayes generalization and MI bounds. Our work extends the recent work by Steinke and Zakynthinou [2020] who handle MI with VC but neither PAC-Bayes nor fast rates, the recent work of Hellström and Durisi [2020] who extend the latter to the PAC-Bayes setting via a unifying exponential inequality, and Mhammedi et al. [2019] who initiated fast rate PAC-Bayes generalization error bounds but handle neither MI nor general VC classes.
CRMay 28, 2021
Privately Learning SubspacesVikrant Singhal, Thomas Steinke
Private data analysis suffers a costly curse of dimensionality. However, the data often has an underlying low-dimensional structure. For example, when optimizing via gradient descent, the gradients often lie in or near a low-dimensional subspace. If that low-dimensional structure can be identified, then we can avoid paying (in terms of privacy or accuracy) for the high ambient dimension. We present differentially private algorithms that take input data sampled from a low-dimensional linear subspace (possibly with a small amount of error) and output that subspace (or an approximation to it). These algorithms can serve as a pre-processing step for other procedures.
CRMay 15, 2021
The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential NoiseZeyu Ding, Daniel Kifer, Sayed M. Saghaian N. E. et al.
The permute-and-flip mechanism is a recently proposed differentially private selection algorithm that was shown to outperform the exponential mechanism. In this paper, we show that permute-and-flip is equivalent to the well-known report noisy max algorithm with exponential noise.
LGFeb 17, 2021
Leveraging Public Data for Practical Private Query ReleaseTerrance Liu, Giuseppe Vietri, Thomas Steinke et al.
In many statistical problems, incorporating priors can significantly improve performance. However, the use of prior knowledge in differentially private query release has remained underexplored, despite such priors commonly being available in the form of public datasets, such as previous US Census releases. With the goal of releasing statistics about a private dataset, we present PMW^Pub, which -- unlike existing baselines -- leverages public data drawn from a related distribution as prior information. We provide a theoretical analysis and an empirical evaluation on the American Community Survey (ACS) and ADULT datasets, which shows that our method outperforms state-of-the-art methods. Furthermore, PMW^Pub scales well to high-dimensional data domains, where running many existing methods would be computationally infeasible.
LGFeb 12, 2021
The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure AggregationPeter Kairouz, Ziyu Liu, Thomas Steinke
We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analysis for sums of discrete Gaussians and carefully analyze the effects of data quantization and modular summation arithmetic. Our theoretical guarantees highlight the complex tension between communication, privacy, and accuracy. Our extensive experimental results demonstrate that our solution is essentially able to match the accuracy to central differential privacy with less than 16 bits of precision per value.
CRSep 11, 2020
Multi-Central Differential PrivacyThomas Steinke
Differential privacy is typically studied in the central model where a trusted "aggregator" holds the sensitive data of all the individuals and is responsible for protecting their privacy. A popular alternative is the local model in which the aggregator is untrusted and instead each individual is responsible for their own privacy. The decentralized privacy guarantee of the local model comes at a high price in statistical utility or computational complexity. Thus intermediate models such as the shuffled model and pan privacy have been studied in an attempt to attain the best of both worlds. In this note, we propose an intermediate trust model for differential privacy, which we call the multi-central model. Here there are multiple aggregators and we only assume that they do not collude nefariously. This model relaxes the trust requirements of the central model while avoiding the price of the local model. We motivate this model and provide some simple and efficient algorithms for it. We argue that this model is a promising direction for further research.
LGJul 10, 2020
New Oracle-Efficient Algorithms for Private Synthetic Data ReleaseGiuseppe Vietri, Grace Tian, Mark Bun et al.
We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle's optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism \cite{McKennaMHM18}, our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\varepsilon$).
CRJun 11, 2020
Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient DescentShuang Song, Thomas Steinke, Om Thakkar et al.
We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $ε$ is the privacy parameter. This bound is attained via differentially private gradient descent (DP-GD). Furthermore, via the first lower bound for unconstrained private ERM, we show that our upper bound is tight. In sharp contrast to the constrained ERM setting, there is no dependence on the dimensionality of the ambient model space ($p$). (Notice that ${\texttt{rank}}\leq \min\{n, p\}$.) Besides, we obtain an analogous excess population risk bound which depends on ${\texttt{rank}}$ instead of $p$. For the smooth non-convex GLM setting (i.e., where the objective function is non-convex but preserves the GLM structure), we further show that DP-GD attains a dimension-independent convergence of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$ to a first-order-stationary-point of the underlying objective. Finally, we show that for convex GLMs, a variant of DP-GD commonly used in practice (which involves clipping the individual gradients) also exhibits the same dimension-independent convergence to the minimum of a well-defined objective. To that end, we provide a structural lemma that characterizes the effect of clipping on the optimization profile of DP-GD.
DSMar 31, 2020
The Discrete Gaussian for Differential PrivacyClément L. Canonne, Gautam Kamath, Thomas Steinke
A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical errors can entirely destroy privacy. Moreover, when the underlying data is itself discrete (e.g., population counts), adding continuous noise makes the result less interpretable. With these shortcomings in mind, we introduce and analyze the discrete Gaussian in the context of differential privacy. Specifically, we theoretically and experimentally show that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuous Gaussian noise. We also present an simple and efficient algorithm for exact sampling from this distribution. This demonstrates its applicability for privately answering counting queries, or more generally, low-sensitivity integer-valued queries.
LGJan 24, 2020
Reasoning About Generalization via Conditional Mutual InformationThomas Steinke, Lydia Zakynthinou
We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms. Our framework ties together existing approaches, including uniform convergence bounds and recent methods for adaptive data analysis. Specifically, we use Conditional Mutual Information (CMI) to quantify how well the input (i.e., the training data) can be recognized given the output (i.e., the trained model) of the learning algorithm. We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods. We then show that bounded CMI implies various forms of generalization.
STJun 6, 2019
Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean EstimationMark Bun, Thomas Steinke
The simplest and most widely applied method for guaranteeing differential privacy is to add instance-independent noise to a statistic of interest that is scaled to its global sensitivity. However, global sensitivity is a worst-case notion that is often too conservative for realized dataset instances. We provide methods for scaling noise in an instance-dependent way and demonstrate that they provide greater accuracy under average-case distributional assumptions. Specifically, we consider the basic problem of privately estimating the mean of a real distribution from i.i.d.~samples. The standard empirical mean estimator can have arbitrarily-high global sensitivity. We propose the trimmed mean estimator, which interpolates between the mean and the median, as a way of attaining much lower sensitivity on average while losing very little in terms of statistical accuracy. To privately estimate the trimmed mean, we revisit the smooth sensitivity framework of Nissim, Raskhodnikova, and Smith (STOC 2007), which provides a framework for using instance-dependent sensitivity. We propose three new additive noise distributions which provide concentrated differential privacy when scaled to smooth sensitivity. We provide theoretical and experimental evidence showing that our noise distributions compare favorably to others in the literature, in particular, when applied to the mean estimation problem.
DSMay 30, 2019
Private Hypothesis SelectionMark Bun, Gautam Kamath, Thomas Steinke et al.
We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we denote by $α$). The sample complexity of our basic algorithm is $O\left(\frac{\log m}{α^2} + \frac{\log m}{α\varepsilon}\right)$, representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle infinite hypothesis classes $\mathcal{H}$ by relaxing to $(\varepsilon,δ)$-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant $α$, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning.
LGDec 7, 2018
A Hybrid Approach to Privacy-Preserving Federated LearningStacey Truex, Nathalie Baracaldo, Ali Anwar et al.
Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions.
LGJun 15, 2018
The Limits of Post-Selection GeneralizationKobbi Nissim, Adam Smith, Thomas Steinke et al.
While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of adaptivity---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure post hoc generalization (also called robust or post-selection generalization), which says that, given the output of the algorithm, it is hard to find any statistic for which the data differs significantly from the population it came from. In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties.
LGDec 19, 2017
Calibrating Noise to Variance in Adaptive Data AnalysisVitaly Feldman, Thomas Steinke
Datasets are often used multiple times and each successive analysis may depend on the outcome of previous analyses. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. A recent line of work studies the challenges that arise from such adaptive data reuse by considering the problem of answering a sequence of "queries" about the data distribution where each query may depend arbitrarily on answers to previous queries. The strongest results obtained for this problem rely on differential privacy -- a strong notion of algorithmic stability with the important property that it "composes" well when data is reused. However the notion is rather strict, as it requires stability under replacement of an arbitrary data element. The simplest algorithm is to add Gaussian (or Laplace) noise to distort the empirical answers. However, analysing this technique using differential privacy yields suboptimal accuracy guarantees when the queries have low variance. Here we propose a relaxed notion of stability that also composes adaptively. We demonstrate that a simple and natural algorithm based on adding noise scaled to the standard deviation of the query provides our notion of stability. This implies an algorithm that can answer statistical queries about the dataset with substantially improved accuracy guarantees for low-variance queries. The only previous approach that provides such accuracy guarantees is based on a more involved differentially private median-of-means algorithm and its analysis exploits stronger "group" stability of the algorithm.
LGJun 15, 2017
Generalization for Adaptively-chosen Estimators via Stable MedianVitaly Feldman, Thomas Steinke
Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this problem in which the goal is to design an algorithm that, given a limited number of i.i.d.~samples from an unknown distribution, can answer adaptively-chosen queries about that distribution. We present an algorithm that estimates the expectations of $k$ arbitrary adaptively-chosen real-valued estimators using a number of samples that scales as $\sqrt{k}$. The answers given by our algorithm are essentially as accurate as if fresh samples were used to evaluate each estimator. In contrast, prior work yields error guarantees that scale with the worst-case sensitivity of each estimator. We also give a version of our algorithm that can be used to verify answers to such queries where the sample complexity depends logarithmically on the number of queries $k$ (as in the reusable holdout technique). Our algorithm is based on a simple approximate median algorithm that satisfies the strong stability guarantees of differential privacy. Our techniques provide a new approach for analyzing the generalization guarantees of differentially private algorithms.
DSApr 10, 2017
Tight Lower Bounds for Differentially Private SelectionThomas Steinke, Jonathan Ullman
A pervasive task in the differential privacy literature is to select the $k$ items of "highest quality" out of a set of $d$ items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algorithms. The standard approaches to these tasks---repeated use of the exponential mechanism or the sparse vector technique---approximately solve this problem given a dataset of $n = O(\sqrt{k}\log d)$ samples. We provide a tight lower bound for some very simple variants of the private selection problem. Our lower bound shows that a sample of size $n = Ω(\sqrt{k} \log d)$ is required even to achieve a very minimal accuracy guarantee. Our results are based on an extension of the fingerprinting method to sparse selection problems. Previously, the fingerprinting method has been used to provide tight lower bounds for answering an entire set of $d$ queries, but often only some much smaller set of $k$ queries are relevant. Our extension allows us to prove lower bounds that depend on both the number of relevant queries and the total number of queries.
CRMay 6, 2016
Concentrated Differential Privacy: Simplifications, Extensions, and Lower BoundsMark Bun, Thomas Steinke
"Concentrated differential privacy" was recently introduced by Dwork and Rothblum as a relaxation of differential privacy, which permits sharper analyses of many privacy-preserving computations. We present an alternative formulation of the concept of concentrated differential privacy in terms of the Renyi divergence between the distributions obtained by running an algorithm on neighboring inputs. With this reformulation in hand, we prove sharper quantitative results, establish lower bounds, and raise a few new questions. We also unify this approach with approximate differential privacy by giving an appropriate definition of "approximate concentrated differential privacy."
CRApr 15, 2016
Make Up Your Mind: The Price of Online Queries in Differential PrivacyMark Bun, Thomas Steinke, Jonathan Ullman
We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query. Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries. Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line.
LGNov 8, 2015
Algorithmic Stability for Adaptive Data AnalysisRaef Bassily, Kobbi Nissim, Adam Smith et al.
Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated the formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $\mathbf{P}$ and a set of $n$ independent samples $\mathbf{x}$ is drawn from $\mathbf{P}$. We seek an algorithm that, given $\mathbf{x}$ as input, accurately answers a sequence of adaptively chosen queries about the unknown distribution $\mathbf{P}$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions: (i) We give upper bounds on the number of samples $n$ that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015, NIPS, 2015). (ii) We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries. As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that stability implies low generalization error. We also study weaker stability guarantees such as bounded KL divergence and total variation distance.
LGMar 16, 2015
More General Queries and Less Generalization Error in Adaptive Data AnalysisRaef Bassily, Adam Smith, Thomas Steinke et al.
Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC '15) and Hardt and Ullman (FOCS '14) initiated the formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $\mathcal{P}$ and a set of $n$ independent samples $x$ is drawn from $\mathcal{P}$. We seek an algorithm that, given $x$ as input, "accurately" answers a sequence of adaptively chosen "queries" about the unknown distribution $\mathcal{P}$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: *We give upper bounds on the number of samples $n$ that are needed to answer statistical queries that improve over the bounds of Dwork et al. *We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and the important class of convex risk minimization queries. As in Dwork et al., our algorithms are based on a connection between differential privacy and generalization error, but we feel that our analysis is simpler and more modular, which may be useful for studying these questions in the future.
DSJan 24, 2015
Between Pure and Approximate Differential PrivacyThomas Steinke, Jonathan Ullman
We show a new lower bound on the sample complexity of $(\varepsilon, δ)$-differentially private algorithms that accurately answer statistical queries on high-dimensional databases. The novelty of our bound is that it depends optimally on the parameter $δ$, which loosely corresponds to the probability that the algorithm fails to be private, and is the first to smoothly interpolate between approximate differential privacy ($δ> 0$) and pure differential privacy ($δ= 0$). Specifically, we consider a database $D \in \{\pm1\}^{n \times d}$ and its \emph{one-way marginals}, which are the $d$ queries of the form "What fraction of individual records have the $i$-th bit set to $+1$?" We show that in order to answer all of these queries to within error $\pm α$ (on average) while satisfying $(\varepsilon, δ)$-differential privacy, it is necessary that $$ n \geq Ω\left( \frac{\sqrt{d \log(1/δ)}}{α\varepsilon} \right), $$ which is optimal up to constant factors. To prove our lower bound, we build on the connection between \emph{fingerprinting codes} and lower bounds in differential privacy (Bun, Ullman, and Vadhan, STOC'14). In addition to our lower bound, we give new purely and approximately differentially private algorithms for answering arbitrary statistical queries that improve on the sample complexity of the standard Laplace and Gaussian mechanisms for achieving worst-case accuracy guarantees by a logarithmic factor.
CCDec 8, 2014
Weighted Polynomial Approximations: Limits for Learning and PseudorandomnessMark Bun, Thomas Steinke
Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on $\mathbb{R}^n$ in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to $\exp(-|x|^{0.99})$. Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only $O(\log(1/δ))$-wise independence, for a tail probability of $δ$. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.
CROct 5, 2014
Interactive Fingerprinting Codes and the Hardness of Preventing False DiscoveryThomas Steinke, Jonathan Ullman
We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given $n$ samples from an unknown distribution. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is accurate if it is "close" to the correct expectation over the distribution. This question was recently studied by Dwork et al., who showed how to answer $\tildeΩ(n^2)$ queries efficiently, and also by Hardt and Ullman, who showed that answering $\tilde{O}(n^3)$ queries is hard. We close the gap between the two bounds and show that, under a standard hardness assumption, there is no computationally efficient algorithm that, given $n$ samples from an unknown distribution, can give valid answers to $O(n^2)$ adaptively chosen statistical queries. An implication of our results is that computationally efficient algorithms for answering arbitrary, adaptively chosen statistical queries may as well be differentially private. We obtain our results using a new connection between the problem of answering adaptively chosen statistical queries and a combinatorial object called an interactive fingerprinting code. In order to optimize our hardness result, we give a new Fourier-analytic approach to analyzing fingerprinting codes that is simpler, more flexible, and yields better parameters than previous constructions.