Wenjie Xiong

CR
h-index15
13papers
300citations
Novelty59%
AI Score48

13 Papers

CRNov 25, 2022Code
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention

Wenxuan Zeng, Meng Li, Wenjie Xiong et al.

Secure multi-party computation (MPC) enables computation directly on encrypted data and protects both data and model privacy in deep learning inference. However, existing neural network architectures, including Vision Transformers (ViTs), are not designed or optimized for MPC and incur significant latency overhead. We observe Softmax accounts for the major latency bottleneck due to a high communication complexity, but can be selectively replaced or linearized without compromising the model accuracy. Hence, in this paper, we propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC. Based on a systematic latency and accuracy evaluation of the Softmax attention and other attention variants, we propose a heterogeneous attention optimization space. We also develop a simple yet effective MPC-aware neural architecture search algorithm for fast Pareto optimization. To further boost the inference efficiency, we propose MPCViT+, to jointly optimize the Softmax attention and other network components, including GeLU, matrix multiplication, etc. With extensive experiments, we demonstrate that MPCViT achieves 1.9%, 1.3% and 3.6% higher accuracy with 6.2x, 2.9x and 1.9x latency reduction compared with baseline ViT, MPCFormer and THE-X on the Tiny-ImageNet dataset, respectively. MPCViT+ further achieves a better Pareto front compared with MPCViT. The code and models for evaluation are available at https://github.com/PKU-SEC-Lab/mpcvit.

LGJun 5, 2023
Information Flow Control in Machine Learning through Modular Model Architecture

Trishita Tiwari, Suchin Gururangan, Chuan Guo et al. · allen-ai

In today's machine learning (ML) models, any part of the training data can affect the model output. This lack of control for information flow from training data to model output is a major obstacle in training models on sensitive data when access control only allows individual users to access a subset of data. To enable secure machine learning for access-controlled data, we propose the notion of information flow control for machine learning, and develop an extension to the Transformer language model architecture that strictly adheres to the IFC definition we propose. Our architecture controls information flow by limiting the influence of training data from each security domain to a single expert module, and only enables a subset of experts at inference time based on the access control policy.The evaluation using large text and code datasets show that our proposed parametric IFC architecture has minimal (1.9%) performance overhead and can significantly improve model accuracy (by 38% for the text dataset, and between 44%--62% for the code datasets) by enabling training on access-controlled data.

LGSep 12, 2022
Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Sanjay Kariyappa, Chuan Guo, Kiwan Maeng et al.

Federated learning (FL) aims to perform privacy-preserving machine learning on distributed data held by multiple data owners. To this end, FL requires the data owners to perform training locally and share the gradient updates (instead of the private inputs) with the central server, which are then securely aggregated over multiple data owners. Although aggregation by itself does not provably offer privacy protection, prior work showed that it may suffice if the batch size is sufficiently large. In this paper, we propose the Cocktail Party Attack (CPA) that, contrary to prior belief, is able to recover the private inputs from gradients aggregated over a very large batch size. CPA leverages the crucial insight that aggregate gradients from a fully connected layer is a linear combination of its inputs, which leads us to frame gradient inversion as a blind source separation (BSS) problem (informally called the cocktail party problem). We adapt independent component analysis (ICA)--a classic solution to the BSS problem--to recover private inputs for fully-connected and convolutional networks, and show that CPA significantly outperforms prior gradient inversion attacks, scales to ImageNet-sized inputs, and works on large batch sizes of up to 1024.

CRJan 26, 2023
GPU-based Private Information Retrieval for On-Device Machine Learning Inference

Maximilian Lam, Jeff Johnson, Wenjie Xiong et al.

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than $20 \times$ over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over $5 \times$ additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to $100,000$ queries per second -- a $>100 \times$ throughput improvement over a CPU-based baseline -- while maintaining model accuracy.

CEDec 12, 2022
Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems

Hanieh Hashemi, Wenjie Xiong, Liu Ke et al.

Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time.

LGJan 15
A Sustainable AI Economy Needs Data Deals That Work for Generators

Ruoxi Jia, Luis Oala, Wenjie Xiong et al.

We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults - missing provenance, asymmetric bargaining power, and non-dynamic pricing - as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an Equitable Data-Value Exchange (EDVEX) Framework to enable a minimal market that benefits all participants. Finally, we outline research directions where our community can make concrete contributions to data deals and contextualize our position with related and orthogonal viewpoints.

CRMay 5
LIPPEN: A Lightweight In-Place Pointer Encryption Architecture for Pointer Integrity

Erfan Iravani, Lalit Prasad Peri, Mohannad Ismail et al.

Memory-safety violations in C and C++ programs continue to enable sophisticated exploitation techniques such as control-flow hijacking and data-oriented attacks. Existing hardware defenses either rely on address space layout randomization (ASLR) or attach explicit metadata to pointers to verify their integrity. External metadata schemes provide strong guarantees, but incur additional memory accesses and memory footprint overhead. In-place authentication mechanisms, such as ARM Pointer Authentication (PAC), achieve low overhead at the cost of limited entropy and susceptibility to brute-force and reuse attacks. This paper presents LIPPEN, a hardware-software co-design for full-pointer encryption that provides strong pointer integrity and confidentiality with zero metadata overhead. LIPPEN treats every pointer as an encrypted block, cryptographically binding it to its execution context and decrypting it transparently at dereference time. By re-purposing the entire 64-bit pointer field for encryption rather than preserving raw address bits, LIPPEN maximizes entropy, eliminates the brute-force weaknesses of truncated authentication codes, and maintains binary compatibility with existing PAC-enabled software. We prototype LIPPEN on FPGA using 64-bit RISC-V Rocket and BOOM cores, and evaluate it with microbenchmarks, nbench, and SPEC CPU2017. We compare against both an in-house RISC-V PAC implementation and Apple's PAC on the M1 processor. Across these workloads, LIPPEN provides comprehensive pointer protection with runtime overhead comparable to PAC-based schemes, while incurring negligible area and power overhead. These results show that LIPPEN is a practical design point for deploying strong pointer protection in real processors.

CRFeb 24, 2025
Towards Reinforcement Learning for Exploration of Speculative Execution Vulnerabilities

Evan Lai, Wenjie Xiong, Edward Suh et al.

Speculative attacks such as Spectre can leak secret information without being discovered by the operating system. Speculative execution vulnerabilities are finicky and deep in the sense that to exploit them, it requires intensive manual labor and intimate knowledge of the hardware. In this paper, we introduce SpecRL, a framework that utilizes reinforcement learning to find speculative execution leaks in post-silicon (black box) microprocessors.

CRJun 26, 2021
Evaluation of Cache Attacks on Arm Processors and Secure Caches

Shuwen Deng, Nikolay Matyunin, Wenjie Xiong et al.

Timing-based side and covert channels in processor caches continue to be a threat to modern computers. This work shows for the first time a systematic, large-scale analysis of Arm devices and the detailed results of attacks the processors are vulnerable to. Compared to x86, Arm uses different architectures, microarchitectural implementations, cache replacement policies, etc., which affects how attacks can be launched, and how security testing for the vulnerabilities should be done. To evaluate security, this paper presents security benchmarks specifically developed for testing Arm processors and their caches. The benchmarks are themselves evaluated with sensitivity tests, which examine how sensitive the benchmarks are to having a correct configuration in the testing phase. Further, to evaluate a large number of devices, this work leverages a novel approach of using a cloud-based Arm device testbed for architectural and security research on timing channels and runs the benchmarks on 34 different physical devices. In parallel, there has been much interest in secure caches to defend the various attacks. Consequently, this paper also investigates secure cache architectures using the proposed benchmarks. Especially, this paper implements and evaluates the secure PL and RF caches, showing the security of PL and RF caches, but also uncovers new weaknesses.

CRMay 27, 2020
Survey of Transient Execution Attacks

Wenjie Xiong, Jakub Szefer

Transient execution attacks, also called speculative execution attacks, have drawn much interest as they exploit the transient execution of instructions, e.g., during branch prediction, to leak data. Transient execution is fundamental to modern computer architectures, yet poses a security risk as has been demonstrated. Since the first disclosure of Spectre and Meltdown attacks in January 2018, a number of new attack types or variants of the attacks have been presented. These attacks have motivated computer architects to rethink the design of processors and propose hardware defenses. This paper summarizes the components and the phases of the transient execution attacks. Each of the components is further discussed and categorized. A set of metrics is proposed for each component to evaluate the feasibility of an attack. Moreover, the data that can be leaked in the attacks are summarized. Further, the existing attacks are compared, and the limitations of these attacks are discussed based on the proposed metrics. In the end, existing mitigations at the micro-architecture level from literature are discussed.

CRNov 19, 2019
A Benchmark Suite for Evaluating Caches' Vulnerability to Timing Attacks

Shuwen Deng, Wenjie Xiong, Jakub Szefer

Timing-based side or covert channels in processor caches continue to present a threat to computer systems, and they are the key to many of the recent Spectre and Meltdown attacks. Based on improvements to an existing three-step model for cache timing-based attacks, this work presents 88 Strong types of theoretical timing-based vulnerabilities in processor caches. To understand and evaluate all possible types of vulnerabilities in processor caches, this work further presents and implements a new benchmark suite which can be used to test to which types of cache timing-based attacks a given processor or cache design is vulnerable. In total, there are 1094 automatically-generated test programs which cover the 88 theoretical vulnerabilities. The benchmark suite generates the Cache Timing Vulnerability Score which can be used to evaluate how vulnerable a specific cache implementation is to different attacks. A smaller Cache Timing Vulnerability Score means the design is more secure, and the scores among different machines can be easily compared. Evaluation is conducted on commodity Intel and AMD processors and shows the differences in processor implementations can result in different types of attacks that they are vulnerable to. Beyond testing commodity processors, the benchmarks and the Cache Timing Vulnerability Score can be used to help designers of new secure processor caches evaluate their design's susceptibility to cache timing-based attacks.

CRMay 20, 2019
Leaking Information Through Cache LRU States

Wenjie Xiong, Jakub Szefer

The Least-Recently Used cache replacement policy and its variants are widely deployed in modern processors. This paper shows for the first time in detail that the LRU states of caches can be used to leak information: any access to a cache by a sender will modify the LRU state, and the receiver is able to observe this through a timing measurement. This paper presents LRU timing-based channels both when the sender and the receiver have shared memory, e.g., shared library data pages, and when they are separate processes without shared memory. In addition, the new LRU timing-based channels are demonstrated on both Intel and AMD processors in scenarios where the sender and the receiver are sharing the cache in both hyper-threaded setting and time-sliced setting. The transmission rate of the LRU channels can be up to 600Kbps per cache set in the hyper-threaded setting. Different from the majority of existing cache channels which require the sender to trigger cache misses, the new LRU channels work with the sender only having cache hits, making the channel faster and more stealthy. This paper also demonstrates that the new LRU channels can be used in transient execution attacks, e.g., Spectre. Further, this paper shows that the LRU channels pose threats to existing secure cache designs, and this work demonstrates the LRU channels affect the secure PL cache. The paper finishes by discussing and evaluating possible defenses.

CRFeb 12, 2019
Intrinsic Rowhammer PUFs: Leveraging the Rowhammer Effect for Improved Security

André Schaller, Wenjie Xiong, Nikolaos Athanasios Anagnostopoulos et al.

Physically Unclonable Functions (PUFs) have become an important and promising hardware primitive for device fingerprinting, device identification, or key storage. Intrinsic PUFs leverage components already found in existing devices, unlike extrinsic silicon PUFs, which are based on customized circuits that involve modification of hardware. In this work, we present a new type of a memory-based intrinsic PUF, which leverages the Rowhammer effect in DRAM modules; the Rowhammer PUF. Our PUF makes use of bit flips, which occur in DRAM cells due to rapid and repeated access of DRAM rows. Prior research has mainly focused on Rowhammer attacks, where the Rowhammer effect is used to illegitimately alter data stored in memory, e.g., to change page table entries or enable privilege escalation attacks. Meanwhile, this is the first work to use the Rowhammer effect in a positive context: to design a novel PUF. We extensively evaluate the Rowhammer PUF using commercial, off-the-shelf devices, not relying on custom hardware or an FPGA-based setup. The evaluation shows that the Rowhammer PUF holds required properties needed for the envisioned security applications, and could be deployed today.