28.0CRMar 12Code
Hunting CUDA Bugs at Scale with cuFuzzMohamed Tarek Ibn ziad, Christos Kozyrakis
GPUs play an increasingly important role in modern software. However, the heterogeneous host-device execution model and expanding software stacks make GPU programs prone to memory-safety and concurrency bugs that evade static analysis. While fuzz-testing, combined with dynamic error checking tools, offers a plausible solution, it remains underutilized for GPUs. In this work, we identify three main obstacles limiting prior GPU fuzzing efforts: (1) kernel-level fuzzing leading to false positives, (2) lack of device-side coverage-guided feedback, and (3) incompatibility between coverage and sanitization tools. We present cuFuzz, the first CUDA-oriented fuzzer that makes GPU fuzzing practical by addressing these obstacles. cuFuzz uses whole program fuzzing to avoid false positives from independently fuzzing device-side kernels. It leverages NVBit to instrument device-side instructions and merges the resultant coverage with compiler-based host coverage. Finally, cuFuzz decouples sanitization from coverage collection by executing host- and device-side sanitizers in separate processes. cuFuzz uncovers 43 previously unknown bugs (19 in commercial libraries) across 14 CUDA programs, including illegal memory accesses, uninitialized reads, and data races. cuFuzz achieves significantly more discovered edges and unique inputs compared to baseline approaches, especially on closed-source targets. Moreover, we quantify the execution time overheads of the different cuFuzz components and add persistent-mode support to improve the overall fuzzing throughput. Our results demonstrate that cuFuzz is an effective and deployable addition to the GPU testing toolbox. cuFuzz is publicly available at https://github.com/NVlabs/cuFuzz/.
CRJul 27, 2020
SPAM: Stateless Permutation of Application MemoryMohamed Tarek Ibn Ziad, Miguel A. Arroyo, Simha Sethumadhavan
In this paper, we propose the Stateless Permutation of Application Memory (SPAM), a software defense that enables fine-grained data permutation for C programs. The key benefits include resilience against attacks that directly exploit software errors (i.e., spatial and temporal memory safety violations) in addition to attacks that exploit hardware vulnerabilities such as ColdBoot, RowHammer or hardware side-channels to disclose or corrupt memory using a single cohesive technique. Unlike prior work, SPAM is stateless by design making it automatically applicable to multi-threaded applications. We implement SPAM as an LLVM compiler pass with an extension to the compiler-rt runtime. We evaluate it on the C subset of the SPEC2017 benchmark suite and three real-world applications: the Nginx web server, the Duktape Javascript interpreter, and the WolfSSL cryptographic library. We further show SPAM's scalability by running a multi-threaded benchmark suite. SPAM has greater security coverage and comparable performance overheads to state-of-the-art software techniques for memory safety on contemporary x86_64 processors. Our security evaluation confirms SPAM's effectiveness in preventing intra/inter spatial/temporal memory violations by making the attacker success chances as low as 1/16!.
CRNov 5, 2019
Using Name Confusion to Enhance SecurityMohamed Tarek Ibn Ziad, Miguel A. Arroyo, Evgeny Manzhosov et al.
We introduce a novel concept, called Name Confusion, and demonstrate how it can be employed to thwart multiple classes of code-reuse attacks. By building upon Name Confusion, we derive Phantom Name System (PNS): a security protocol that provides multiple names (addresses) to program instructions. Unlike the conventional model of virtual memory with a one-to-one mapping between instructions and virtual memory addresses, PNS creates N mappings for the same instruction, and randomly switches between them at runtime. PNS achieves fast randomization, at the granularity of basic blocks, which mitigates a class of attacks known as (just-in-time) code-reuse. If an attacker uses a memory safety-related vulnerability to cause any of the instruction addresses to be different from the one chosen during a fetch, the exploited program will crash. We quantitatively evaluate how PNS mitigates real-world code-reuse attacks by reducing the success probability of typical exploits to approximately $10^{-12}$. We implement PNS and validate it by running SPEC CPU2017 benchmark suite. We further verify its practicality by adding it to a RISC-V core on an FPGA. Lastly, PNS is mainly designed for resource constrained (wimpy) devices and has negligible performance overhead, compared to commercially-available, state-of-the-art, hardware-based protections.