LGDec 21, 2022
SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine LearningAhmed Salem, Giovanni Cherubin, David Evans et al.
Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning. We use this framework to (1) provide a unifying structure for definitions of inference risks, (2) formally establish known relations among definitions, and (3) to uncover hitherto unknown relations that would have been difficult to spot otherwise.
LGJun 10, 2022
Bayesian Estimation of Differential PrivacySantiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople et al.
Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice. An emerging strand of work empirically estimates the protection afforded by differentially private training as a confidence interval for the privacy budget $\varepsilon$ spent on training a model. Existing approaches derive confidence intervals for $\varepsilon$ from confidence intervals for the false positive and false negative rates of membership inference attacks. Unfortunately, obtaining narrow high-confidence intervals for $ε$ using this method requires an impractically large sample size and training as many models as samples. We propose a novel Bayesian method that greatly reduces sample size, and adapt and validate a heuristic to draw more than one sample per trained model. Our Bayesian method exploits the hypothesis testing interpretation of differential privacy to obtain a posterior for $\varepsilon$ (not just a confidence interval) from the joint posterior of the false positive and false negative rates of membership inference attacks. For the same sample size and confidence, we derive confidence intervals for $\varepsilon$ around 40% narrower than prior work. The heuristic, which we adapt from label-only DP, can be used to further reduce the number of trained models needed to get enough samples by up to 2 orders of magnitude.
LGNov 27, 2023
Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control PerspectiveLukas Wutschitz, Boris Köpf, Andrew Paverd et al.
Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic techniques such as dataset sanitization and differentially private model training, with inherent privacy/utility trade-offs that hurt model performance. Moreover, these techniques have limitations in scenarios where sensitive information is shared across multiple participants and fine-grained access control is required. By ignoring metadata, we therefore miss an opportunity to better address security, privacy, and confidentiality challenges. In this paper, we take an information flow control perspective to describe machine learning systems, which allows us to leverage metadata such as access control policies and define clear-cut privacy and confidentiality guarantees with interpretable information flows. Under this perspective, we contrast two different approaches to achieve user-level non-interference: 1) fine-tuning per-user models, and 2) retrieval augmented models that access user-specific datasets at inference time. We compare these two approaches to a trivially non-interfering zero-shot baseline using a public model and to a baseline that fine-tunes this model on the whole corpus. We evaluate trained models on two datasets of scientific articles and demonstrate that retrieval augmented architectures deliver the best utility, scalability, and flexibility while satisfying strict non-interference guarantees.
CRMay 29, 2025Code
Securing AI Agents with Information-Flow ControlManuel Costa, Boris Köpf, Aashish Kolluri et al.
As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides
CRDec 12, 2023
Maatphor: Automated Variant Analysis for Prompt Injection AttacksAhmed Salem, Andrew Paverd, Boris Köpf
Prompt injection has emerged as a serious security threat to large language models (LLMs). At present, the current best-practice for defending against newly-discovered prompt injection techniques is to add additional guardrails to the system (e.g., by updating the system prompt or using classifiers on the input and/or output of the model.) However, in the same way that variants of a piece of malware are created to evade anti-virus software, variants of a prompt injection can be created to evade the LLM's guardrails. Ideally, when a new prompt injection technique is discovered, candidate defenses should be tested not only against the successful prompt injection, but also against possible variants. In this work, we present, a tool to assist defenders in performing automated variant analysis of known prompt injection attacks. This involves solving two main challenges: (1) automatically generating variants of a given prompt according, and (2) automatically determining whether a variant was effective based only on the output of the model. This tool can also assist in generating datasets for jailbreak and prompt injection attacks, thus overcoming the scarcity of data in this domain. We evaluate Maatphor on three different types of prompt injection tasks. Starting from an ineffective (0%) seed prompt, Maatphor consistently generates variants that are at least 60% effective within the first 40 iterations.
CRFeb 22, 2024
Closed-Form Bounds for DP-SGD against Record-level InferenceGiovanni Cherubin, Boris Köpf, Andrew Paverd et al.
Machine learning models trained with differentially-private (DP) algorithms such as DP-SGD enjoy resilience against a wide range of privacy attacks. Although it is possible to derive bounds for some attacks based solely on an $(\varepsilon,δ)$-DP guarantee, meaningful bounds require a small enough privacy budget (i.e., injecting a large amount of noise), which results in a large loss in utility. This paper presents a new approach to evaluate the privacy of machine learning models against specific record-level threats, such as membership and attribute inference, without the indirection through DP. We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds. Our proofs model DP-SGD as an information theoretic channel whose inputs are the secrets that an attacker wants to infer (e.g., membership of a data record) and whose outputs are the intermediate model parameters produced by iterative optimization. We obtain bounds for membership inference that match state-of-the-art techniques, whilst being orders of magnitude faster to compute. Additionally, we present a novel data-dependent bound against attribute inference. Our results provide a direct, interpretable, and practical way to evaluate the privacy of trained models against specific inference threats without sacrificing utility.
CRFeb 11
Optimizing Agent Planning for Security and AutonomyAashish Kolluri, Rishi Sharma, Manuel Costa et al.
Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing information-flow control defense against prompt injection and evaluate it on the AgentDojo and WASP benchmarks. Experiments show that this approach yields higher autonomy without sacrificing utility.
CRMay 14, 2021
Revizor: Testing Black-box CPUs against Speculation ContractsOleksii Oleksenko, Christof Fetzer, Boris Köpf et al.
Speculative vulnerabilities such as Spectre and Meltdown expose speculative execution state that can be exploited to leak information across security domains via side-channels. Such vulnerabilities often stay undetected for a long time as we lack the tools for systematic testing of CPUs to find them. In this paper, we propose an approach to automatically detect microarchitectural information leakage in commercial black-box CPUs. We build on speculation contracts, which we employ to specify the permitted side effects of program execution on the CPU's microarchitectural state. We propose a Model-based Relational Testing (MRT) technique to empirically assess the CPU compliance with these specifications. We implement MRT in a testing framework called Revizor, and showcase its effectiveness on real Intel x86 CPUs. Revizor automatically detects violations of a rich set of contracts, or indicates their absence. A highlight of our findings is that Revizor managed to automatically surface Spectre, MDS, and LVI, as well as several previously unknown variants.
CRJun 6, 2020
Hardware-Software Contracts for Secure SpeculationMarco Guarnieri, Boris Köpf, Jan Reineke et al.
Since the discovery of Spectre, a large number of hardware mechanisms for secure speculation has been proposed. Intuitively, more defensive mechanisms are less efficient but can securely execute a larger class of programs, while more permissive mechanisms may offer more performance but require more defensive programming. Unfortunately, there are no hardware-software contracts that would turn this intuition into a basis for principled co-design. In this paper, we put forward a framework for specifying such contracts, and we demonstrate its expressiveness and flexibility. On the hardware side, we use the framework to provide the first formalization and comparison of the security guarantees provided by a representative class of mechanisms for secure speculation. On the software side, we use the framework to characterize program properties that guarantee secure co-design in two scenarios traditionally investigated in isolation: (1) ensuring that a benign program does not leak information while computing on confidential data, and (2) ensuring that a potentially malicious program cannot read outside of its designated sandbox. Finally, we show how the properties corresponding to both scenarios can be checked based on existing tools for software verification, and we use them to validate our findings on executable code.
CRMay 28, 2020
Flushgeist: Cache Leaks from Beyond the FlushPepe Vila, Andreas Abel, Marco Guarnieri et al.
Flushing the cache, using instructions like clflush and wbinvd, is commonly proposed as a countermeasure against access-based cache attacks. In this report, we show that several Intel caches, specifically the L1 caches in some pre-Skylake processors and the L2 caches in some post-Broadwell processors, leak information even after being flushed through clflush and wbinvd instructions. That is, security-critical assumptions about the behavior of clflush and wbinvd instructions are incorrect, and countermeasures that rely on them should be revised.
LGDec 17, 2019
Analyzing Information Leakage of Updates to Natural Language ModelsSantiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople et al.
To continuously improve quality and reflect changes in data, machine learning applications have to regularly retrain and update their core models. We show that a differential analysis of language model snapshots before and after an update can reveal a surprising amount of detailed information about changes in the training data. We propose two new metrics---\emph{differential score} and \emph{differential rank}---for analyzing the leakage due to updates of natural language models. We perform leakage analysis using these metrics across models trained on several different datasets using different methods and configurations. We discuss the privacy implications of our findings, propose mitigation strategies and evaluate their effect.
CRDec 20, 2018
SPECTECTOR: Principled Detection of Speculative Information FlowsMarco Guarnieri, Boris Köpf, José F. Morales et al.
Since the advent of SPECTRE, a number of countermeasures have been proposed and deployed. Rigorously reasoning about their effectiveness, however, requires a well-defined notion of security against speculative execution attacks, which has been missing until now. In this paper (1) we put forward speculative non-interference, the first semantic notion of security against speculative execution attacks, and (2) we develop SPECTECTOR, an algorithm based on symbolic execution to automatically prove speculative non-interference, or to detect violations. We implement SPECTECTOR in a tool, which we use to detect subtle leaks and optimizations opportunities in the way major compilers place SPECTRE countermeasures. A scalability analysis indicates that checking speculative non-interference does not exhibit fundamental bottlenecks beyond those inherited by symbolic execution.
CROct 2, 2018
Theory and Practice of Finding Eviction SetsPepe Vila, Boris Köpf, José Francisco Morales
Many micro-architectural attacks rely on the capability of an attacker to efficiently find small eviction sets: groups of virtual addresses that map to the same cache set. This capability has become a decisive primitive for cache side-channel, rowhammer, and speculative execution attacks. Despite their importance, algorithms for finding small eviction sets have not been systematically studied in the literature. In this paper, we perform such a systematic study. We begin by formalizing the problem and analyzing the probability that a set of random virtual addresses is an eviction set. We then present novel algorithms, based on ideas from threshold group testing, that reduce random eviction sets to their minimal core in linear time, improving over the quadratic state-of-the-art. We complement the theoretical analysis of our algorithms with a rigorous empirical evaluation in which we identify and isolate factors that affect their reliability in practice, such as adaptive cache replacement strategies and TLB thrashing. Our results indicate that our algorithms enable finding small eviction sets much faster than before, and under conditions where this was previously deemed impractical.
CRJul 3, 2018
On the Incomparability of Cache Algorithms in Terms of Timing LeakagePablo Cañones, Boris Köpf, Jan Reineke
Modern computer architectures rely on caches to reduce the latency gap between the CPU and main memory. While indispensable for performance, caches pose a serious threat to security because they leak information about memory access patterns of programs via execution time. In this paper, we present a novel approach for reasoning about the security of cache algorithms with respect to timing leaks. The basis of our approach is the notion of leak competitiveness, which compares the leakage of two cache algorithms on every possible program. Based on this notion, we prove the following two results: First, we show that leak competitiveness is symmetric in the cache algorithms. This implies that no cache algorithm dominates another in terms of leakage via a program's total execution time. This is in contrast to performance, where it is known that such dominance relationships exist. Second, when restricted to caches with finite control, the leak-competitiveness relationship between two cache algorithms is either asymptotically linear or constant. No other shapes are possible.
CRFeb 22, 2017
Loophole: Timing Attacks on Shared Event Loops in ChromePepe Vila, Boris Köpf
Event-driven programming (EDP) is the prevalent paradigm for graphical user interfaces, web clients, and it is rapidly gaining importance for server-side and network programming. Central components of EDP are {\em event loops}, which act as FIFO queues that are used by processes to store and dispatch messages received from other processes. In this paper we demonstrate that shared event loops are vulnerable to side-channel attacks, where a spy process monitors the loop usage pattern of other processes by enqueueing events and measuring the time it takes for them to be dispatched. Specifically, we exhibit attacks against the two central event loops in Google's Chrome web browser: that of the I/O thread of the host process, which multiplexes all network events and user actions, and that of the main thread of the renderer processes, which handles rendering and Javascript tasks. For each of these loops, we show how the usage pattern can be monitored with high resolution and low overhead, and how this can be abused for malicious purposes, such as web page identification, user behavior detection, and covert communication.
CRJan 23, 2017
Security Analysis of Cache Replacement PoliciesPablo Cañones, Boris Köpf, Jan Reineke
Modern computer architectures share physical resources between different programs in order to increase area-, energy-, and cost-efficiency. Unfortunately, sharing often gives rise to side channels that can be exploited for extracting or transmitting sensitive information. We currently lack techniques for systematic reasoning about this interplay between security and efficiency. In particular, there is no established way for quantifying security properties of shared caches. In this paper, we propose a novel model that enables us to characterize important security properties of caches. Our model encompasses two aspects: (1) The amount of information that can be absorbed by a cache, and (2) the amount of information that can effectively be extracted from the cache by an adversary. We use our model to compute both quantities for common cache replacement policies (FIFO, LRU, and PLRU) and to compare their isolation properties. We further show how our model for information extraction leads to an algorithm that can be used to improve the bounds delivered by the CacheAudit static analyzer.
CRMar 7, 2016
Rigorous Analysis of Software Countermeasures against Cache AttacksGoran Doychev, Boris Köpf
CPU caches introduce variations into the execution time of programs that can be exploited by adversaries to recover private information about users or cryptographic keys. Establishing the security of countermeasures against this threat often requires intricate reasoning about the interactions of program code, memory layout, and hardware architecture and has so far only been done for restricted cases. In this paper we devise novel techniques that provide support for bit-level and arithmetic reasoning about memory accesses in the presence of dynamic memory allocation. These techniques enable us to perform the first rigorous analysis of widely deployed software countermeasures against cache attacks on modular exponentiation, based on executable code.