28.8CRApr 16
It's a Feature, Not a Bug: Secure and Auditable State Rollback for Confidential Cloud ApplicationsQuinn Burke, Anjo Vahldiek-Oberwagner, Michael Swift et al.
Replay and rollback attacks threaten cloud application integrity by reintroducing authentic yet stale data through an untrusted storage interface to compromise application decision-making. Prior security frameworks mitigate these attacks by enforcing forward-only state transitions (state continuity) with hardware-backed mechanisms, but they categorically treat all rollback as malicious and thus preclude legitimate rollbacks used for operational recovery from corruption or misconfiguration. We present Rebound, a general-purpose security framework that preserves rollback protection while enabling policy-authorized legitimate rollbacks of application binaries, configuration, and data. Key to Rebound is a reference monitor that mediates state transitions, enforces authorization policy, guarantees atomicity of state updates and rollbacks, and emits a tamper-evident log that provides transparency to applications and auditors. We analyze Rebound's security properties and show through an application case study -- with software deployment workflows in GitLab CI -- that it enables robust control over binary, configuration, and raw data versioning with low end-to-end overhead.
56.0CRMar 12
Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI SystemsSarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner et al.
Rapid progress in generative AI has given rise to Compound AI systems - pipelines comprised of multiple large language models (LLM), software tools and database systems. Compound AI systems are constructed on a layered traditional software stack running on a distributed hardware infrastructure. Many of the diverse software components are vulnerable to traditional security flaws documented in the Common Vulnerabilities and Exposures (CVE) database, while the underlying distributed hardware infrastructure remains exposed to timing attacks, bit-flip faults, and power-based side channels. Today, research targets LLM-specific risks like model extraction, training data leakage, and unsafe generation -- overlooking the impact of traditional system vulnerabilities. This work investigates how traditional software and hardware vulnerabilities can complement LLM-specific algorithmic attacks to compromise the integrity of a compound AI pipeline. We demonstrate two novel attacks that combine system-level vulnerabilities with algorithmic weaknesses: (1) Exploiting a software code injection flaw along with a guardrail Rowhammer attack to inject an unaltered jailbreak prompt into an LLM, resulting in an AI safety violation, and (2) Manipulating a knowledge database to redirect an LLM agent to transmit sensitive user data to a malicious application, thus breaching confidentiality. These attacks highlight the need to address traditional vulnerabilities; we systematize the attack primitives and analyze their composition by grouping vulnerabilities by their objective and mapping them to distinct stages of an attack lifecycle. This approach enables a rigorous red-teaming exercise and lays the groundwork for future defense strategies.
CRSep 9, 2020Code
Privacy-Preserving Machine Learning in Untrusted Clouds Made SimpleDayeol Lee, Dmitrii Kuvaiskii, Anjo Vahldiek-Oberwagner et al.
We present a practical framework to deploy privacy-preserving machine learning (PPML) applications in untrusted clouds based on a trusted execution environment (TEE). Specifically, we shield unmodified PyTorch ML applications by running them in Intel SGX enclaves with encrypted model parameters and encrypted input data to protect the confidentiality and integrity of these secrets at rest and during runtime. We use the open-source Graphene library OS with transparent file encryption and SGX-based remote attestation to minimize porting effort and seamlessly provide file protection and attestation. Our approach is completely transparent to the machine learning application: the developer and the end-user do not need to modify the ML application in any way.
CRNov 20, 2024
SoK: A Systems Perspective on Compound AI Threats and CountermeasuresSarbartha Banerjee, Prateek Sahu, Mulong Luo et al.
Large language models (LLMs) used across enterprises often use proprietary models and operate on sensitive inputs and data. The wide range of attack vectors identified in prior research - targeting various software and hardware components used in training and inference - makes it extremely challenging to enforce confidentiality and integrity policies. As we advance towards constructing compound AI inference pipelines that integrate multiple large language models (LLMs), the attack surfaces expand significantly. Attackers now focus on the AI algorithms as well as the software and hardware components associated with these systems. While current research often examines these elements in isolation, we find that combining cross-layer attack observations can enable powerful end-to-end attacks with minimal assumptions about the threat model. Given, the sheer number of existing attacks at each layer, we need a holistic and systemized understanding of different attack vectors at each layer. This SoK discusses different software and hardware attacks applicable to compound AI systems and demonstrates how combining multiple attack mechanisms can reduce the threat model assumptions required for an isolated attack. Next, we systematize the ML attacks in lines with the Mitre Att&ck framework to better position each attack based on the threat model. Finally, we outline the existing countermeasures for both software and hardware layers and discuss the necessity of a comprehensive defense strategy to enable the secure and high-performance deployment of compound AI systems.
CRAug 8, 2021
The Endokernel: Fast, Secure, and Programmable Subprocess VirtualizationBumjin Im, Fangfei Yang, Chia-Che Tsai et al.
Commodity applications contain more and more combinations of interacting components (user, application, library, and system) and exhibit increasingly diverse tradeoffs between isolation, performance, and programmability. We argue that the challenge of future runtime isolation is best met by embracing the multi-principle nature of applications, rethinking process architecture for fast and extensible intra-process isolation. We present, the Endokernel, a new process model and security architecture that nests an extensible monitor into the standard process for building efficient least-authority abstractions. The Endokernel introduces a new virtual machine abstraction for representing subprocess authority, which is enforced by an efficient self-isolating monitor that maps the abstraction to system level objects (processes, threads, files, and signals). We show how the Endokernel can be used to develop specialized separation abstractions using an exokernel-like organization to provide virtual privilege rings, which we use to reorganize and secure NGINX. Our prototype, includes a new syscall monitor, the nexpoline, and explores the tradeoffs of implementing it with diverse mechanisms, including Intel Control Enhancement Technology. Overall, we believe sub-process isolation is a must and that the Endokernel exposes an essential set of abstractions for realizing this in a simple and feasible way.
CRFeb 25, 2021
Swivel: Hardening WebAssembly against SpectreShravan Narayan, Craig Disselkoen, Daniel Moghimi et al.
We describe Swivel, a new compiler framework for hardening WebAssembly (Wasm) against Spectre attacks. Outside the browser, Wasm has become a popular lightweight, in-process sandbox and is, for example, used in production to isolate different clients on edge clouds and function-as-a-service platforms. Unfortunately, Spectre attacks can bypass Wasm's isolation guarantees. Swivel hardens Wasm against this class of attacks by ensuring that potentially malicious code can neither use Spectre attacks to break out of the Wasm sandbox nor coerce victim code-another Wasm client or the embedding process-to leak secret data. We describe two Swivel designs, a software-only approach that can be used on existing CPUs, and a hardware-assisted approach that uses extension available in Intel 11th generation CPUs. For both, we evaluate a randomized approach that mitigates Spectre and a deterministic approach that eliminates Spectre altogether. Our randomized implementations impose under 10.3% overhead on the Wasm-compatible subset of SPEC 2006, while our deterministic implementations impose overheads between 3.3% and 240.2%. Though high on some benchmarks, Swivel's overhead is still between 9x and 36.3x smaller than existing defenses that rely on pipeline fences.
CRJan 21, 2018
ERIM: Secure, Efficient In-process Isolation with Memory Protection Keys (MPK)Anjo Vahldiek-Oberwagner, Eslam Elnikety, Nuno O. Duarte et al.
Isolating sensitive state and data can increase the security and robustness of many applications. Examples include protecting cryptographic keys against exploits like OpenSSL's Heartbleed bug or protecting a language runtime from native libraries written in unsafe languages. When runtime references across isolation boundaries occur relatively infrequently, then conventional page-based hardware isolation can be used, because the cost of kernel- or hypervisor-mediated domain switching is tolerable. However, some applications, such as the isolation of cryptographic session keys in network-facing services, require very frequent domain switching. In such applications, the overhead of kernel- or hypervisor-mediated domain switching is prohibitive. In this paper, we present ERIM, a novel technique that provides hardware-enforced isolation with low overhead on x86 CPUs, even at high switching rates (ERIM's measured overhead is less than 1% for 100,000 switches per second). The key idea is to combine protection keys (MPKs), a feature recently added to x86 that allows protection domain switches in userspace, with binary inspection to prevent circumvention. We show that ERIM can be applied with little effort to new and existing applications, doesn't require compiler changes, can run on a stock Linux kernel, and has low runtime overhead even at high domain switching rates.