Ergute Bao

3papers

60citations

Novelty58%

AI Score46

Ranked #60,652 of 205,806 authors (top 29%)#1,324 in CR (top 18%)

3 Papers

LGDec 8, 2022

Skellam Mixture Mechanism: a Novel Approach to Federated Learning with Differential Privacy

Ergute Bao, Yizheng Zhu, Xiaokui Xiao et al.

Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants, who jointly train a model through federated learning (FL), using both secure multiparty computation (MPC) to ensure the confidentiality of each gradient update, and differential privacy to avoid data leakage in the resulting model. A major challenge in this setting is that common mechanisms for enforcing DP in deep learning, which inject real-valued noise, are fundamentally incompatible with MPC, which exchanges finite-field integers among the participants. Consequently, most existing DP mechanisms require rather high noise levels, leading to poor model utility. Motivated by this, we propose Skellam mixture mechanism (SMM), an approach to enforce DP on models built via FL. Compared to existing methods, SMM eliminates the assumption that the input gradients must be integer-valued, and, thus, reduces the amount of noise injected to preserve DP. Further, SMM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to accurate deep learning with DP. The theoretical analysis of SMM is highly non-trivial, especially considering (i) the complicated math of differentially private deep learning in general and (ii) the fact that the mixture of two Skellam distributions is rather complex, and to our knowledge, has not been studied in the DP literature. Extensive experiments on various practical settings demonstrate that SMM consistently and significantly outperforms existing solutions in terms of the utility of the resulting model.

CROct 18, 2022

DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling

Jianxin Wei, Ergute Bao, Xiaokui Xiao et al.

Nowadays, differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning. The combination of these two techniques, i.e., deep learning with differential privacy, promises the privacy-preserving release of high-utility models trained with sensitive data such as medical records. A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the stochastic gradient descent (SGD) optimizer commonly used for DNN training. Subsequent approaches have improved various aspects of the model training process, including noise decay schedule, model architecture, feature engineering, and hyperparameter tuning. However, the core mechanism for enforcing DP in the SGD optimizer remains unchanged ever since the original DP-SGD algorithm, which has increasingly become a fundamental barrier limiting the performance of DP-compliant machine learning solutions. Motivated by this, we propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core optimizer of DP-SGD, with consistent and significant accuracy gains over the latter. The main idea is to employ importance sampling (IS) in each SGD iteration for mini-batch selection, which reduces both sampling variance and the amount of random noise injected to the gradients that is required to satisfy DP. Integrating IS into the complex mathematical machinery of DP-SGD is highly non-trivial. DPIS addresses the challenge through novel mechanism designs, fine-grained privacy analysis, efficiency enhancements, and an adaptive gradient clipping optimization. Extensive experiments on four benchmark datasets, namely MNIST, FMNIST, CIFAR-10 and IMDb, demonstrate the superior effectiveness of DPIS over existing solutions for deep learning with differential privacy.

54.6CRMay 20Code

Auditing Apple's DifferentialPrivacy.framework: Implementation Bugs, Misconfigurations, and Practical Risks

Rishav Chourasia, Ergute Bao, Uzair Javaid et al.

Since 2016, Apple has claimed that device analytics collected to improve user experience are protected by differential privacy (DP). Apple's DifferentialPrivacy.framework is deployed across its operating systems and handles sensitive signals such as Safari domains, keyboard events, photo attributes, and health-related reports. Because Apple has not open-sourced its privatization algorithms, these privacy claims have been difficult to verify independently. We present a client-side audit of Apple's DP framework on macOS Sonoma 14.2 and Sequoia 15.6. We reverse engineer the shipped binaries, recover Objective-C interfaces, build runtime harnesses that execute Apple's deployed mechanisms, and test whether their outputs match the advertised privacy guarantees. Our audit covers nearly all active deployed mechanisms, including Count Median Sketch, Hadamard-CMS, randomized-response mechanisms, and Prio-style secure aggregation. We find multiple implementation bugs and misconfigurations. Every audited mechanism that relies on floating-point noise fails to meet its advertised DP or zero-knowledge proof guarantee, due to insecure samplers with known floating-point vulnerabilities. We also find secure-aggregation configurations with local DP disabled, exposing pre-aggregation records to any party with access to those logs. Overall, we find DP violations in 5 of 9 audited mechanisms, affecting 87% of data collection in macOS Sonoma and 68% in Sequoia. We also identify public leaked iPhone logs that can be decoded to recover private information, including Safari domains and keyboard emoji signals.