Peinan Li

4papers

21citations

Novelty57%

AI Score41

Ranked #90,207 of 201,326 authors (top 45%)#2,250 in CR (top 31%)

4 Papers

CRMay 11

Janus: Compiler-Based Defense Against Transient Execution Attacks Using ARM Hardware Primitives

Ciyan Ouyang, Peinan Li, Yubiao Huang et al.

We present Janus, a compiler-based security framework that mitigates transient execution attacks like Spectre and control-flow hijacking on ARM64 platforms. Janus integrates speculative execution and control flow dependencies with PA modifiers, using PA and BTI microarchitectural features to prevent control-flow speculation attacks and secure both control flow and speculative execution through existing control-flow integrity mechanisms. To optimize performance, Janus minimizes overhead by merging defense operations across different defense layers (modifier fusion) and reusing registers of protected variables (carrier reuse), while maintaining strong security guarantees. Evaluation on SPEC CPU2017 shows an average performance overhead of 3.85%, with real-world applications exhibiting overheads ranging from 2.97% to 7.80%. Janus offers effective speculative execution security and low performance and code size overhead, making it a robust solution for ARM-based systems.

CRDec 2, 2020

PiPoMonitor: Mitigating Cross-core Cache Attacks Using the Auto-Cuckoo Filter

Fengkai Yuan, Kai Wang, Rui Hou et al.

Cache side channel attacks obtain victim cache line access footprint to infer security-critical information. Among them, cross-core attacks exploiting the shared last level cache are more threatening as their simplicity to set up and high capacity. Stateful approaches of detection-based mitigation observe precise cache behaviors and protect specific cache lines that are suspected of being attacked. However, their recording structures incur large storage overhead and are vulnerable to reverse engineering attacks. Exploring the intrinsic non-determinate layout of a traditional Cuckoo filter, this paper proposes a space efficient Auto-Cuckoo filter to record access footprints, which succeed to decrease storage overhead and resist reverse engineering attacks at the same time. With Auto-Cuckoo filter, we propose PiPoMonitor to detect \textit{Ping-Pong patterns} and prefetch specific cache line to interfere with adversaries' cache probes. Security analysis shows the PiPoMonitor can effectively mitigate cross-core attacks and the Auto-Cuckoo filter is immune to reverse engineering attacks. Evaluation results indicate PiPoMonitor has negligible impact on performance and the storage overhead is only 0.37$\%$, an order of magnitude lower than previous stateful approaches.

CRMay 17, 2020

A Lightweight Isolation Mechanism for Secure Branch Predictors

Lutan Zhao, Peinan Li, Rui Hou et al.

Recently exposed vulnerabilities reveal the necessity to improve the security of branch predictors. Branch predictors record history about the execution of different programs, and such information from different processes are stored in the same structure and thus accessible to each other. This leaves the attackers with the opportunities for malicious training and malicious perception. Instead of flush-based or physical isolation of hardware resources, we want to achieve isolation of the content in these hardware tables with some lightweight processing using randomization as follows. (1) Content encoding. We propose to use hardware-based thread-private random numbers to encode the contents of the branch predictor tables (both direction and destination histories) which we call XOR-BP. Specifically, the data is encoded by XOR operation with the key before written in the table and decoded after read from the table. Such a mechanism obfuscates the information adding difficulties to cross-process or cross-privilege level analysis and perception. It achieves a similar effect of logical isolation but adds little in terms of space or time overheads. (2) Index encoding. We propose a randomized index mechanism of the branch predictor (Noisy-XOR-BP). Similar to the XOR-BP, another thread-private random number is used together with the branch instruction address as the input to compute the index of the branch predictor. This randomized indexing mechanism disrupts the correspondence between the branch instruction address and the branch predictor entry, thus increases the noise for malicious perception attacks. Our analyses using an FPGA-based RISC-V processor prototype and additional auxiliary simulations suggest that the proposed mechanisms incur a very small performance cost while providing strong protection.

CRApr 9, 2019

Enabling Privacy-Preserving, Compute- and Data-Intensive Computing using Heterogeneous Trusted Execution Environment

Jianping Zhu, Rui Hou, XiaoFeng Wang et al.

There is an urgent demand for privacy-preserving techniques capable of supporting compute and data intensive (CDI) computing in the era of big data. However, none of existing TEEs can truly support CDI computing tasks, as CDI requires high throughput accelerators like GPU and TPU but TEEs do not offer security protection of such accelerators. This paper present HETEE (Heterogeneous TEE), the first design of TEE capable of strongly protecting heterogeneous computing with unsecure accelerators. HETEE is uniquely constructed to work with today's servers, and does not require any changes for existing commercial CPUs or accelerators. The key idea of our design runs security controller as a stand-alone computing system to dynamically adjust the boundary of between secure and insecure worlds through the PCIe switches, rendering the control of an accelerator to the host OS when it is not needed for secure computing, and shifting it back when it is. The controller is the only trust unit in the system and it runs the custom OS and accelerator runtimes, together with the encryption, authentication and remote attestation components. The host server and other computing systems communicate with controller through an in memory task queue that accommodates the computing tasks offloaded to HETEE, in the form of encrypted and signed code and data. Also, HETEE offers a generic and efficient programming model to the host CPU. We have implemented the HETEE design on a hardware prototype system, and evaluated it with large-scale Neural Networks inference and training tasks. Our evaluations show that HETEE can easily support such secure computing tasks and only incurs a 12.34% throughput overhead for inference and 9.87% overhead for training on average.