ITJan 21
Optimality of Staircase Mechanisms for Vector Queries under Differential PrivacyJames Melbourne, Mario Diaz, Shahab Asoodeh
We study the optimal design of additive mechanisms for vector-valued queries under $ε$-differential privacy (DP). Given only the sensitivity of a query and a norm-monotone cost function measuring utility loss, we ask which noise distribution minimizes expected cost among all additive $ε$-DP mechanisms. Using convex rearrangement theory, we show that this infinite-dimensional optimization problem admits a reduction to a one-dimensional compact and convex family of radially symmetric distributions whose extreme points are the staircase distributions. As a consequence, we prove that for any dimension, any norm, and any norm-monotone cost function, there exists an $ε$-DP staircase mechanism that is optimal among all additive mechanisms. This result resolves a conjecture of Geng, Kairouz, Oh, and Viswanath, and provides a geometric explanation for the emergence of staircase mechanisms as extremal solutions in differential privacy.
LGNov 13, 2024
Locally Private Sampling with Public DataBehnoosh Zamanlooy, Mario Diaz, Shahab Asoodeh
Local differential privacy (LDP) is increasingly employed in privacy-preserving machine learning to protect user data before sharing it with an untrusted aggregator. Most LDP methods assume that users possess only a single data record, which is a significant limitation since users often gather extensive datasets (e.g., images, text, time-series data) and frequently have access to public datasets. To address this limitation, we propose a locally private sampling framework that leverages both the private and public datasets of each user. Specifically, we assume each user has two distributions: $p$ and $q$ that represent their private dataset and the public dataset, respectively. The objective is to design a mechanism that generates a private sample approximating $p$ while simultaneously preserving $q$. We frame this objective as a minimax optimization problem using $f$-divergence as the utility measure. We fully characterize the minimax optimal mechanisms for general $f$-divergences provided that $p$ and $q$ are discrete distributions. Remarkably, we demonstrate that this optimal mechanism is universal across all $f$-divergences. Experiments validate the effectiveness of our minimax optimal sampler compared to the state-of-the-art locally private sampler.
MLMay 13, 2025
Lower Bounds on the MMSE of Adversarially Inferring Sensitive FeaturesMonica Welfert, Nathan Stromberg, Mario Diaz et al.
We propose an adversarial evaluation framework for sensitive feature inference based on minimum mean-squared error (MMSE) estimation with a finite sample size and linear predictive models. Our approach establishes theoretical lower bounds on the true MMSE of inferring sensitive features from noisy observations of other correlated features. These bounds are expressed in terms of the empirical MMSE under a restricted hypothesis class and a non-negative error term. The error term captures both the estimation error due to finite number of samples and the approximation error from using a restricted hypothesis class. For linear predictive models, we derive closed-form bounds, which are order optimal in terms of the noise variance, on the approximation error for several classes of relationships between the sensitive and non-sensitive features, including linear mappings, binary symmetric channels, and class-conditional multi-variate Gaussian distributions. We also present a new lower bound that relies on the MSE computed on a hold-out validation dataset of the MMSE estimator learned on finite-samples and a restricted hypothesis class. Through empirical evaluation, we demonstrate that our framework serves as an effective tool for MMSE-based adversarial evaluation of sensitive feature inference that balances theoretical guarantees with practical efficiency.
LGMay 17, 2023
Privacy Loss of Noisy Stochastic Gradient Descent Might Converge Even for Non-Convex LossesShahab Asoodeh, Mario Diaz
The Noisy-SGD algorithm is widely used for privately training machine learning models. Traditional privacy analyses of this algorithm assume that the internal state is publicly revealed, resulting in privacy loss bounds that increase indefinitely with the number of iterations. However, recent findings have shown that if the internal state remains hidden, then the privacy loss might remain bounded. Nevertheless, this remarkable result heavily relies on the assumption of (strong) convexity of the loss function. It remains an important open problem to further relax this condition while proving similar convergent upper bounds on the privacy loss. In this work, we address this problem for DP-SGD, a popular variant of Noisy-SGD that incorporates gradient clipping to limit the impact of individual samples on the training process. Our findings demonstrate that the privacy loss of projected DP-SGD converges exponentially fast, without requiring convexity or smoothness assumptions on the loss function. In addition, we analyze the privacy loss of regularized (unprojected) DP-SGD. To obtain these results, we directly analyze the hockey-stick divergence between coupled stochastic processes by relying on non-linear data processing inequalities.
ITDec 20, 2020
Contraction of $E_γ$-Divergence and Its Applications to PrivacyShahab Asoodeh, Mario Diaz, Flavio P. Calmon
We investigate the contraction coefficients derived from strong data processing inequalities for the $E_γ$-divergence. By generalizing the celebrated Dobrushin's coefficient from total variation distance to $E_γ$-divergence, we derive a closed-form expression for the contraction of $E_γ$-divergence. This result has fundamental consequences in two privacy settings. First, it implies that local differential privacy can be equivalently expressed in terms of the contraction of $E_γ$-divergence. This equivalent formula can be used to precisely quantify the impact of local privacy in (Bayesian and minimax) estimation and hypothesis testing problems in terms of the reduction of effective sample size. Second, it leads to a new information-theoretic technique for analyzing privacy guarantees of online algorithms. In this technique, we view such algorithms as a composition of amplitude-constrained Gaussian channels and then relate their contraction coefficients under $E_γ$-divergence to the overall differential privacy guarantees. As an example, we apply our technique to derive the differential privacy parameters of gradient descent. Moreover, we also show that this framework can be tailored to batch learning algorithms that can be implemented with one pass over the training dataset.
LGJun 22, 2020
On the alpha-loss Landscape in the Logistic ModelTyler Sypherd, Mario Diaz, Lalitha Sankar et al.
We analyze the optimization landscape of a recently introduced tunable class of loss functions called $α$-loss, $α\in (0,\infty]$, in the logistic model. This family encapsulates the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$) and contains compelling properties that enable the practitioner to discern among a host of operating conditions relevant to emerging learning methods. Specifically, we study the evolution of the optimization landscape of $α$-loss with respect to $α$ using tools drawn from the study of strictly-locally-quasi-convex functions in addition to geometric techniques. We interpret these results in terms of optimization complexity via normalized gradient descent.
LGFeb 12, 2020
To Split or Not to Split: The Impact of Disparate Treatment in ClassificationHao Wang, Hsiang Hsu, Mario Diaz et al.
Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.
ITJan 17, 2020
Privacy Amplification of Iterative Algorithms via Contraction CoefficientsShahab Asoodeh, Mario Diaz, Flavio P. Calmon
We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative mappings can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for $f$-divergences. In particular, by generalizing the Dobrushin's contraction coefficient for total variation distance to an $f$-divergence known as $E_γ$-divergence, we derive tighter bounds on the differential privacy parameters of the projected noisy stochastic gradient descent algorithm with hidden intermediate updates.
MLNov 8, 2019
Theoretical Guarantees for Model Auditing with Finite AdversariesMario Diaz, Peter Kairouz, Jiachun Liao et al.
Privacy concerns have led to the development of privacy-preserving approaches for learning models from sensitive data. Yet, in practice, even models learned with privacy guarantees can inadvertently memorize unique training examples or leak sensitive features. To identify such privacy violations, existing model auditing techniques use finite adversaries defined as machine learning models with (a) access to some finite side information (e.g., a small auditing dataset), and (b) finite capacity (e.g., a fixed neural network architecture). Our work investigates the requirements under which an unsuccessful attempt to identify privacy violations by a finite adversary implies that no stronger adversary can succeed at such a task. We do so via parameters that quantify the capabilities of the finite adversary, including the size of the neural network employed by such an adversary and the amount of side information it has access to as well as the regularity of the (perhaps privacy-guaranteeing) audited model.
LGJun 5, 2019
A Tunable Loss Function for Robust Classification: Calibration, Landscape, and GeneralizationTyler Sypherd, Mario Diaz, John Kevin Cava et al.
We introduce a tunable loss function called $α$-loss, parameterized by $α\in (0,\infty]$, which interpolates between the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between $α$-loss and Arimoto conditional entropy, verify the classification-calibration of $α$-loss in order to demonstrate asymptotic optimality via Rademacher complexity generalization techniques, and build-upon a notion called strictly local quasi-convexity in order to quantitatively characterize the optimization landscape of $α$-loss. Practically, we perform class imbalance, robustness, and classification experiments on benchmark image datasets using convolutional-neural-networks. Our main practical conclusion is that certain tasks may benefit from tuning $α$-loss away from log-loss ($α= 1$), and to this end we provide simple heuristics for the practitioner. In particular, navigating the $α$ hyperparameter can readily provide superior model robustness to label flips ($α> 1$) and sensitivity to imbalanced classes ($α< 1$).
LGFeb 12, 2019
A Tunable Loss Function for Binary ClassificationTyler Sypherd, Mario Diaz, Lalitha Sankar et al.
We present $α$-loss, $α\in [1,\infty]$, a tunable loss function for binary classification that bridges log-loss ($α=1$) and $0$-$1$ loss ($α= \infty$). We prove that $α$-loss has an equivalent margin-based form and is classification-calibrated, two desirable properties for a good surrogate loss function for the ideal yet intractable $0$-$1$ loss. For logistic regression-based classification, we provide an upper bound on the difference between the empirical and expected risk at the empirical risk minimizers for $α$-loss by exploiting its Lipschitzianity along with recent results on the landscape features of empirical risk functions. Finally, we show that $α$-loss with $α= 2$ performs better than log-loss on MNIST for logistic regression.
ITJan 18, 2018
The Utility Cost of Robust Privacy GuaranteesHao Wang, Mario Diaz, Flavio P. Calmon et al.
Consider a data publishing setting for a data set with public and private features. The objective of the publisher is to maximize the amount of information about the public features in a revealed data set, while keeping the information leaked about the private features bounded. The goal of this paper is to analyze the performance of privacy mechanisms that are constructed to match the distribution learned from the data set. Two distinct scenarios are considered: (i) mechanisms are designed to provide a privacy guarantee for the learned distribution; and (ii) mechanisms are designed to provide a privacy guarantee for every distribution in a given neighborhood of the learned distribution. For the first scenario, given any privacy mechanism, upper bounds on the difference between the privacy-utility guarantees for the learned and true distributions are presented. In the second scenario, upper bounds on the reduction in utility incurred by providing a uniform privacy guarantee are developed.
ITJul 8, 2017
Estimation Efficiency Under Privacy ConstraintsShahab Asoodeh, Mario Diaz, Fady Alajaji et al.
We investigate the problem of estimating a random variable $Y\in \mathcal{Y}$ under a privacy constraint dictated by another random variable $X\in \mathcal{X}$, where estimation efficiency and privacy are assessed in terms of two different loss functions. In the discrete case, we use the Hamming loss function and express the corresponding utility-privacy tradeoff in terms of the privacy-constrained guessing probability $h(P_{XY}, ε)$, the maximum probability $\mathsf{P}_\mathsf{c}(Y|Z)$ of correctly guessing $Y$ given an auxiliary random variable $Z\in \mathcal{Z}$, where the maximization is taken over all $P_{Z|Y}$ ensuring that $\mathsf{P}_\mathsf{c}(X|Z)\leq ε$ for a given privacy threshold $ε\geq 0$. We prove that $h(P_{XY}, \cdot)$ is concave and piecewise linear, which allows us to derive its expression in closed form for any $ε$ when $X$ and $Y$ are binary. In the non-binary case, we derive $h(P_{XY}, ε)$ in the high utility regime (i.e., for sufficiently large values of $ε$) under the assumption that $Z$ takes values in $\mathcal{Y}$. We also analyze the privacy-constrained guessing probability for two binary vector scenarios. When $X$ and $Y$ are continuous random variables, we use the squared-error loss function and express the corresponding utility-privacy tradeoff in terms of $\mathsf{sENSR}(P_{XY}, ε)$, which is the smallest normalized minimum mean squared-error (mmse) incurred in estimating $Y$ from its Gaussian perturbation $Z$, such that the mmse of $f(X)$ given $Z$ is within $ε$ of the variance of $f(X)$ for any non-constant real-valued function $f$. We derive tight upper and lower bounds for $\mathsf{sENSR}$ when $Y$ is Gaussian. We also obtain a tight lower bound for $\mathsf{sENSR}(P_{XY}, ε)$ for general absolutely continuous random variables when $ε$ is sufficiently small.
ITApr 12, 2017
Privacy-Aware Guessing EfficiencyShahab Asoodeh, Mario Diaz, Fady Alajaji et al.
We investigate the problem of guessing a discrete random variable $Y$ under a privacy constraint dictated by another correlated discrete random variable $X$, where both guessing efficiency and privacy are assessed in terms of the probability of correct guessing. We define $h(P_{XY}, ε)$ as the maximum probability of correctly guessing $Y$ given an auxiliary random variable $Z$, where the maximization is taken over all $P_{Z|Y}$ ensuring that the probability of correctly guessing $X$ given $Z$ does not exceed $ε$. We show that the map $ε\mapsto h(P_{XY}, ε)$ is strictly increasing, concave, and piecewise linear, which allows us to derive a closed form expression for $h(P_{XY}, ε)$ when $X$ and $Y$ are connected via a binary-input binary-output channel. For $(X^n, Y^n)$ being pairs of independent and identically distributed binary random vectors, we similarly define $\underline{h}_n(P_{X^nY^n}, ε)$ under the assumption that $Z^n$ is also a binary vector. Then we obtain a closed form expression for $\underline{h}_n(P_{X^nY^n}, ε)$ for sufficiently large, but nontrivial values of $ε$.
ITNov 7, 2015
Information Extraction Under Privacy ConstraintsShahab Asoodeh, Mario Diaz, Fady Alajaji et al.
A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables $(X,Y)$ governed by a given joint distribution, an agent observes $Y$ and wants to convey to a potentially public user as much information about $Y$ as possible without compromising the amount of information revealed about $X$. To this end, the so-called {\em rate-privacy function} is introduced to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from $Y$ under a privacy constraint between $X$ and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and information-theoretic and estimation-theoretic interpretations of it are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of $(X,Y)$. Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where $(X,Y)$ has a joint probability density function by studying the problem where the extracted information is a uniform quantization of $Y$ corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.