Srinivas Devadas

CR
h-index2
12papers
759citations
Novelty68%
AI Score54

12 Papers

LGOct 7, 2022
Differentially Private Deep Learning with ModelMix

Hanshen Xiao, Jun Wan, Srinivas Devadas

Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs random aggregation of intermediate model states. It strengthens the composite privacy analysis utilizing the entropy of the training trajectory and improves the $(ε, δ)$ DP security parameters by an order of magnitude. We provide rigorous analyses for both the utility guarantees and privacy amplification of ModelMix. In particular, we present a formal study on the effect of gradient clipping in DP-SGD, which provides theoretical instruction on how hyper-parameters should be selected. We also introduce a refined gradient clipping method, which can further sharpen the privacy loss in private learning when combined with ModelMix. Thorough experiments with significant privacy/utility improvement are presented to support our theory. We train a Resnet-20 network on CIFAR10 with $70.4\%$ accuracy via ModelMix given $(ε=8, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=145.8,δ=10^{-5})$ using regular DP-SGD; assisted with additional public low-dimensional gradient embedding, one can further improve the accuracy to $79.1\%$ with $(ε=6.1, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=111.2, δ=10^{-5})$ without ModelMix.

85.3ARApr 20Code
Enabling AI ASICs for Zero Knowledge Proof

Jianming Tong, Jingtian Dang, Simon Langowski et al.

Zero-knowledge proof (ZKP) provers remain costly because multi-scalar multiplication (MSM) and number-theoretic transforms (NTTs) dominate runtime as they need significant computation. AI ASICs such as TPUs provide massive matrix throughput and SotA energy efficiency. We present MORPH, the first framework that reformulates ZKP kernels to match AI-ASIC execution. We introduce Big-T complexity, a hardware-aware complexity model that exposes heterogeneous bottlenecks and layout-transformation costs ignored by Big-O. Guided by this analysis, (1) at arithmetic level, MORPH develops an MXU-centric extended-RNS lazy reduction that converts high-precision modular arithmetic into dense low-precision GEMMs, eliminating all carry chains, and (2) at dataflow level, MORPH constructs a unified-sharding layout-stationary TPU Pippenger MSM and optimized 3/5-step NTT that avoid on-TPU shuffles to minimize costly memory reorganization. Implemented in JAX, MORPH enables TPUv6e8 to achieve up-to 10x higher throughput on NTT and comparable throughput on MSM than GZKP. Our code: https://github.com/EfficientPPML/MORPH.

66.6LGMay 7
PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen et al.

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.

LGJan 20
PAC-Private Responses with Adversarial Composition

Xiaochen Zhu, Mayuri Sridhar, Srinivas Devadas

Modern machine learning models are increasingly deployed behind APIs. This renders standard weight-privatization methods (e.g. DP-SGD) unnecessarily noisy at the cost of utility. While model weights may vary significantly across training datasets, model responses to specific inputs are much lower dimensional and more stable. This motivates enforcing privacy guarantees directly on model outputs. We approach this under PAC privacy, which provides instance-based privacy guarantees for arbitrary black-box functions by controlling mutual information (MI). Importantly, PAC privacy explicitly rewards output stability with reduced noise levels. However, a central challenge remains: response privacy requires composing a large number of adaptively chosen, potentially adversarial queries issued by untrusted users, where existing composition results on PAC privacy are inadequate. We introduce a new algorithm that achieves adversarial composition via adaptive noise calibration and prove that mutual information guarantees accumulate linearly under adaptive and adversarial querying. Experiments across tabular, vision, and NLP tasks show that our method achieves high utility at extremely small per-query privacy budgets. On CIFAR-10, we achieve 87.79% accuracy with a per-step MI budget of $2^{-32}$. This enables serving one million queries while provably bounding membership inference attack (MIA) success rates to 51.08% -- the same guarantee of $(0.04, 10^{-5})$-DP. Furthermore, we show that private responses can be used to label public data to distill a publishable privacy-preserving model; using an ImageNet subset as a public dataset, our model distilled from 210,000 responses achieves 91.86% accuracy on CIFAR-10 with MIA success upper-bounded by 50.49%, which is comparable to $(0.02,10^{-5})$-DP.

OCFeb 16, 2019
Local Differential Privacy in Decentralized Optimization

Hanshen Xiao, Yu Ye, Srinivas Devadas

Privacy concerns with sensitive data are receiving increasing attention. In this paper, we study local differential privacy (LDP) in interactive decentralized optimization. By constructing random local aggregators, we propose a framework to amplify LDP by a constant. We take Alternating Direction Method of Multipliers (ADMM), and decentralized gradient descent as two concrete examples, where experiments support our theory. In an asymptotic view, we address the following question: Under LDP, is it possible to design a distributed private minimizer for arbitrary closed convex constraints with utility loss not explicitly dependent on dimensionality? As an affiliated result, we also show that with merely linear secret sharing, information theoretic privacy is achievable for bounded colluding agents.

CRJan 14, 2019
XRD: Scalable Messaging System with Cryptographic Privacy

Albert Kwon, David Lu, Srinivas Devadas

Even as end-to-end encrypted communication becomes more popular, private messaging remains a challenging problem due to metadata leakages, such as who is communicating with whom. Most existing systems that hide communication metadata either (1) do not scale easily, (2) incur significant overheads, or (3) provide weaker guarantees than cryptographic privacy, such as differential privacy or heuristic privacy. This paper presents XRD (short for Crossroads), a metadata private messaging system that provides cryptographic privacy, while scaling easily to support more users by adding more servers. At a high level, XRD uses multiple mix networks in parallel with several techniques, including a novel technique we call aggregate hybrid shuffle. As a result, XRD can support 2 million users with 251 seconds of latency with 100 servers. This is 12x and 3.7x faster than Atom and Pung, respectively, which are prior scalable messaging systems with cryptographic privacy.

CRDec 27, 2018
Sanctorum: A lightweight security monitor for secure enclaves

Ilia Lebedev, Kyle Hogan, Jules Drean et al.

Enclaves have emerged as a particularly compelling primitive to implement trusted execution environments: strongly isolated sensitive user-mode processes in a largely untrusted software environment. While the threat models employed by various enclave systems differ, the high-level guarantees they offer are essentially the same: attestation of an enclave's initial state, as well as a guarantee of enclave integrity and privacy in the presence of an adversary. This work describes Sanctorum, a small trusted code base (TCB), consisting of a generic enclave-capable system, which is sufficient to implement secure enclaves akin to the primitive offered by Intel's SGX. While enclaves may be implemented via unconditionally trusted hardware and microcode, as it is the case in SGX, we employ a smaller TCB principally consisting of authenticated, privileged software, which may be replaced or patched as needed. Sanctorum implements a formally verified specification for generic enclaves on an in-order multiprocessor system meeting baseline security requirements, e.g., the MIT Sanctum processor and the Keystone enclave framework. Sanctorum requires trustworthy hardware including a random number generator, a private cryptographic key pair derived via a secure bootstrapping protocol, and a robust isolation primitive to safeguard sensitive information. Sanctorum's threat model is informed by the threat model of the isolation primitive, and is suitable for adding enclaves to a variety of processor systems.

CRDec 24, 2018
MI6: Secure Enclaves in a Speculative Out-of-Order Processor

Thomas Bourgeat, Ilia Lebedev, Andrew Wright et al.

Recent attacks have broken process isolation by exploiting microarchitectural side channels that allow indirect access to shared microarchitectural state. Enclaves strengthen the process abstraction to restore isolation guarantees. We propose MI6, an aggressive, speculative out-of-order processor capable of providing secure enclaves under a threat model that includes an untrusted OS and an attacker capable of mounting any software attack currently considered practical, including control flow speculation attacks. MI6 is inspired by Sanctum [16] and extends its isolation guarantee to more realistic memory hierarchies. It also introduces a purge instruction, which is used only when a secure process is scheduled, and implements it for a complex processor microarchitecture. We model the performance impact of enclaves in MI6 through FPGA emulation on AWS F1 FPGAs by running SPEC CINT2006 benchmarks on top of an untrusted Linux OS. Security comes at the cost of approximately 16.4% average slowdown for protected programs.

CRFeb 28, 2018
Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning

Sanjit Bhat, David Lu, Albert Kwon et al.

In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over $1\%$ higher true positive rate (TPR) than state-of-the-art attacks while achieving $4\times$ lower false positive rate (FPR). Var-CNN's improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by $3.12\%$ while increasing the TPR by $13\%$. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues.

DCApr 7, 2017
Efficient Synchronous Byzantine Consensus

Ittai Abraham, Srinivas Devadas, Danny Dolev et al.

We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption. We present a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case. The key challenge is to ensure quorum intersection at one honest replica. Our solution is to rely on the synchrony assumption to form a post-commit quorum of size $2f+1$, which intersects at $f+1$ replicas with any pre-commit quorums of size $f+1$. Our protocol also solves synchronous authenticated Byzantine agreement in expected 8 rounds. The best previous solution (Katz and Koo, 2006) requires expected 24 rounds. Our protocols may be applied to build Byzantine fault tolerant systems or improve cryptographic protocols such as cryptocurrencies when synchrony can be assumed.

CRDec 23, 2016
Atom: Horizontally Scaling Strong Anonymity

Albert Kwon, Henry Corrigan-Gibbs, Srinivas Devadas et al.

Atom is an anonymous messaging system that protects against traffic-analysis attacks. Unlike many prior systems, each Atom server touches only a small fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: a user is anonymous among all honest users of the system, against an active adversary who controls the entire network, a portion of the system's servers, and any number of malicious users. The architectural ideas behind Atom have been known in theory, but putting them into practice requires new techniques for (1) avoiding the reliance on heavy general-purpose multi-party computation protocols, (2) defeating active attacks by malicious servers at minimal performance cost, and (3) handling server failure and churn. Atom is most suitable for sending a large number of short messages, as in a microblogging application or a high-security communication bootstrapping ("dialing") for private messaging systems. We show that, on a heterogeneous network of 1,024 servers, Atom can transit a million Tweet-length messages in 28 minutes. This is over 23x faster than prior systems with similar privacy guarantees.

CRFeb 23, 2012
Path ORAM: An Extremely Simple Oblivious RAM Protocol

Emil Stefanov, Marten van Dijk, Elaine Shi et al.

We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best known ORAM schemes with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal.