Ayfer Özgür

LG
9papers
8,779citations
Novelty61%
AI Score50

9 Papers

LGJul 12, 2023
Differentially Private Decoupled Graph Convolutions for Multigranular Topology Protection

Eli Chien, Wei-Ning Chen, Chao Pan et al.

GNNs can inadvertently expose sensitive user information and interactions through their model predictions. To address these privacy concerns, Differential Privacy (DP) protocols are employed to control the trade-off between provable privacy protection and model utility. Applying standard DP approaches to GNNs directly is not advisable due to two main reasons. First, the prediction of node labels, which relies on neighboring node attributes through graph convolutions, can lead to privacy leakage. Second, in practical applications, the privacy requirements for node attributes and graph topology may differ. In the latter setting, existing DP-GNN models fail to provide multigranular trade-offs between graph topology privacy, node attribute privacy, and GNN utility. To address both limitations, we propose a new framework termed Graph Differential Privacy (GDP), specifically tailored to graph learning. GDP ensures both provably private model parameters as well as private predictions. Additionally, we describe a novel unified notion of graph dataset adjacency to analyze the properties of GDP for different levels of graph topology privacy. Our findings reveal that DP-GNNs, which rely on graph convolutions, not only fail to meet the requirements for multigranular graph topology privacy but also necessitate the injection of DP noise that scales at least linearly with the maximum node degree. In contrast, our proposed Differentially Private Decoupled Graph Convolutions (DPDGCs) represent a more flexible and efficient alternative to graph convolutions that still provides the necessary guarantees of GDP. To validate our approach, we conducted extensive experiments on seven node classification benchmarking and illustrative synthetic datasets. The results demonstrate that DPDGCs significantly outperform existing DP-GNNs in terms of privacy-utility trade-offs.

CRJul 9, 2022
The Poisson binomial mechanism for secure and private federated learning

Wei-Ning Chen, Ayfer Özgür, Peter Kairouz

We introduce the Poisson Binomial mechanism (PBM), a discrete differential privacy mechanism for distributed mean estimation (DME) with applications to federated learning and analytics. We provide a tight analysis of its privacy guarantees, showing that it achieves the same privacy-accuracy trade-offs as the continuous Gaussian mechanism. Our analysis is based on a novel bound on the Rényi divergence of two Poisson binomial distributions that may be of independent interest. Unlike previous discrete DP schemes based on additive noise, our mechanism encodes local information into a parameter of the binomial distribution, and hence the output distribution is discrete with bounded support. Moreover, the support does not increase as the privacy budget $\varepsilon \rightarrow 0$ as in the case of additive schemes which require the addition of more noise to achieve higher privacy; on the contrary, the support becomes smaller as $\varepsilon \rightarrow 0$. The bounded support enables us to combine our mechanism with secure aggregation (SecAgg), a multi-party cryptographic protocol, without the need of performing modular clipping which results in an unbiased estimator of the sum of the local vectors. This in turn allows us to apply it in the private FL setting and provide an upper bound on the convergence rate of the SGD algorithm. Moreover, since the support of the output distribution becomes smaller as $\varepsilon \rightarrow 0$, the communication cost of our scheme decreases with the privacy constraint $\varepsilon$, outperforming all previous distributed DP schemes based on additive noise in the high privacy or low communication regimes.

LGJun 15, 2023
Training generative models from privatized data

Daria Reshetova, Wei-Ning Chen, Ayfer Özgür

Local differential privacy is a powerful method for privacy-preserving data collection. In this paper, we develop a framework for training Generative Adversarial Networks (GANs) on differentially privatized data. We show that entropic regularization of optimal transport - a popular regularization method in the literature that has often been leveraged for its computational benefits - enables the generator to learn the raw (unprivatized) data distribution even though it only has access to privatized samples. We prove that at the same time this leads to fast statistical convergence at the parametric rate. This shows that entropic regularization of optimal transport uniquely enables the mitigation of both the effects of privatization noise and the curse of dimensionality in statistical convergence. We provide experimental evidence to support the efficacy of our framework in practice.

LGMay 8Code
Less Random, More Private: What is the Optimal Subsampling Scheme for DP-SGD?

Andy Dong, Ayfer Özgür

Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: each sample appears in very different numbers of training iterations. In this work, we show that this variance is not merely a practical artifact to be tolerated, but a fundamental source of suboptimal privacy amplification. We prove that Balanced Iteration Subsampling (BIS), a structured scheme in which each sample participates in exactly a fixed number of iterations, achieves stronger privacy amplification than Poisson subsampling and is optimal at both extremes of the noise spectrum ($σ\to 0$ and $σ\to \infty$). Our analysis reveals that the privacy-noise tradeoff is governed not by maximizing randomness, but by eliminating participation variance while preserving uniform marginal participation across iterations. To translate this asymptotic theory into finite-noise guarantees, we introduce a practical near-exact Monte Carlo accountant for BIS, which removes the analytical slack of existing RDP and composition-based PLD analyses. Evaluations across more than 60 practical DP-SGD configurations show that BIS consistently outperforms Poisson subsampling in the low-noise regimes most relevant for high-utility private training, reducing the required noise multiplier by up to $9.6\%$. These results overturn the common intuition that more sampling randomness necessarily yields stronger privacy amplification: in DP-SGD, structured participation can be both more practical and more private. Our implementation is available at https://github.com/dong-xin-ao-andy/bis-mc-accountant.

MLJun 16, 2021
Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints

Wei-Ning Chen, Peter Kairouz, Ayfer Özgür

We consider the problem of estimating a $d$-dimensional $s$-sparse discrete distribution from its samples observed under a $b$-bit communication constraint. The best-known previous result on $\ell_2$ estimation error for this problem is $O\left( \frac{s\log\left( {d}/{s}\right)}{n2^b}\right)$. Surprisingly, we show that when sample size $n$ exceeds a minimum threshold $n^*(s, d, b)$, we can achieve an $\ell_2$ estimation error of $O\left( \frac{s}{n2^b}\right)$. This implies that when $n>n^*(s, d, b)$ the convergence rate does not depend on the ambient dimension $d$ and is the same as knowing the support of the distribution beforehand. We next ask the question: ``what is the minimum $n^*(s, d, b)$ that allows dimension-free convergence?''. To upper bound $n^*(s, d, b)$, we develop novel localization schemes to accurately and efficiently localize the unknown support. For the non-interactive setting, we show that $n^*(s, d, b) = O\left( \min \left( {d^2\log^2 d}/{2^b}, {s^4\log^2 d}/{2^b}\right) \right)$. Moreover, we connect the problem with non-adaptive group testing and obtain a polynomial-time estimation scheme when $n = \tildeΩ\left({s^4\log^4 d}/{2^b}\right)$. This group testing based scheme is adaptive to the sparsity parameter $s$, and hence can be applied without knowing it. For the interactive setting, we propose a novel tree-based estimation scheme and show that the minimum sample-size needed to achieve dimension-free convergence can be further reduced to $n^*(s, d, b) = \tilde{O}\left( {s^2\log^2 d}/{2^b} \right)$.

LGJul 22, 2020
Breaking the Communication-Privacy-Accuracy Trilemma

Wei-Ning Chen, Peter Kairouz, Ayfer Özgür

Two major challenges in distributed learning and estimation are 1) preserving the privacy of the local samples; and 2) communicating them efficiently to a central server, while achieving high accuracy for the end-to-end task. While there has been significant interest in addressing each of these challenges separately in the recent literature, treatments that simultaneously address both challenges are still largely missing. In this paper, we develop novel encoding and decoding mechanisms that simultaneously achieve optimal privacy and communication efficiency in various canonical settings. In particular, we consider the problems of mean estimation and frequency estimation under $\varepsilon$-local differential privacy and $b$-bit communication constraints. For mean estimation, we propose a scheme based on Kashin's representation and random sampling, with order-optimal estimation error under both constraints. For frequency estimation, we present a mechanism that leverages the recursive structure of Walsh-Hadamard matrices and achieves order-optimal estimation error for all privacy levels and communication budgets. As a by-product, we also construct a distribution estimation mechanism that is rate-optimal for all privacy regimes and communication constraints, extending recent work that is limited to $b=1$ and $\varepsilon=O(1)$. Our results demonstrate that intelligent encoding under joint privacy and communication constraints can yield a performance that matches the optimal accuracy achievable under either constraint alone.

OCJun 15, 2020
Lower Bounds and a Near-Optimal Shrinkage Estimator for Least Squares using Random Projections

Srivatsan Sridhar, Mert Pilanci, Ayfer Özgür

In this work, we consider the deterministic optimization using random projections as a statistical estimation problem, where the squared distance between the predictions from the estimator and the true solution is the error metric. In approximately solving a large scale least squares problem using Gaussian sketches, we show that the sketched solution has a conditional Gaussian distribution with the true solution as its mean. Firstly, tight worst case error lower bounds with explicit constants are derived for any estimator using the Gaussian sketch, and the classical sketching is shown to be the optimal unbiased estimator. For biased estimators, the lower bound also incorporates prior knowledge about the true solution. Secondly, we use the James-Stein estimator to derive an improved estimator for the least squares solution using the Gaussian sketch. An upper bound on the expected error of this estimator is derived, which is smaller than the error of the classical Gaussian sketch solution for any given data. The upper and lower bounds match when the SNR of the true solution is known to be small and the data matrix is well conditioned. Empirically, this estimator achieves smaller error on simulated and real datasets, and works for other common sketching methods as well.

LGDec 10, 2019
Advances and Open Problems in Federated Learning

Peter Kairouz, H. Brendan McMahan, Brendan Avent et al.

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

ITJun 10, 2016
Phase Retrieval via Incremental Truncated Wirtinger Flow

Ritesh Kolte, Ayfer Özgür

In the phase retrieval problem, an unknown vector is to be recovered given quadratic measurements. This problem has received considerable attention in recent times. In this paper, we present an algorithm to solve a nonconvex formulation of the phase retrieval problem, that we call $\textit{Incremental Truncated Wirtinger Flow}$. Given random Gaussian sensing vectors, we prove that it converges linearly to the solution, with an optimal sample complexity. We also provide stability guarantees of the algorithm under noisy measurements. Performance and comparisons with existing algorithms are illustrated via numerical experiments on simulated and real data, with both random and structured sensing vectors.