LGJun 5, 2023
Information Flow Control in Machine Learning through Modular Model ArchitectureTrishita Tiwari, Suchin Gururangan, Chuan Guo et al. · allen-ai
In today's machine learning (ML) models, any part of the training data can affect the model output. This lack of control for information flow from training data to model output is a major obstacle in training models on sensitive data when access control only allows individual users to access a subset of data. To enable secure machine learning for access-controlled data, we propose the notion of information flow control for machine learning, and develop an extension to the Transformer language model architecture that strictly adheres to the IFC definition we propose. Our architecture controls information flow by limiting the influence of training data from each security domain to a single expert module, and only enables a subset of experts at inference time based on the access control policy.The evaluation using large text and code datasets show that our proposed parametric IFC architecture has minimal (1.9%) performance overhead and can significantly improve model accuracy (by 38% for the text dataset, and between 44%--62% for the code datasets) by enabling training on access-controlled data.
77.3CRMay 11
Engineering Robustness into Personal Agents with the AI Workflow StoreRoxana Geambasu, Mariana Raykova, Pierre Tholoniat et al.
The dominant paradigm for AI agents is an "on-the-fly" loop in which agents synthesize plans and execute actions within seconds or minutes in response to user prompts. We argue that this paradigm short-circuits disciplined software engineering (SE) processes -- iterative design, rigorous testing, adversarial evaluation, staged deployment, and more -- that have delivered the (relatively) reliable and secure systems we use today. By focusing on rapid, real-time synthesis, are AI agents effectively delivering users improvised prototypes rather than systems fit for high-stakes scenarios in which users may unwittingly apply them? This paper argues for the need to integrate rigorous SE processes into the agentic loop to produce production-grade, hardened, and deterministically-constrained agent *workflows* that substantially outperform the potentially brittle and vulnerable results of on-the-fly synthesis. Doing so may require extra compute and time, and if so, we must amortize the cost of rigor through reuse across a broad user community. We envision an *AI Workflow Store* that consists of hardened and reusable workflows that agents can invoke with far greater reliability and security than improvised tool chains. We outline the research challenges of this vision, which stem from a broader flexibility-robustness tension that we argue requires moving beyond the ``on-the-fly'' paradigm to navigate effectively.
CLDec 15, 2024
Sequence-Level Leakage Risk of Training Data in Large Language ModelsTrishita Tiwari, G. Edward Suh
This work quantifies the risk of training data leakage from LLMs (Large Language Models) using sequence-level probabilities. Computing extraction probabilities for individual sequences provides finer-grained information than has been studied in prior benchmarking work. We re-analyze the effects of decoding schemes, model sizes, prefix lengths, partial sequence leakages, and token positions to uncover new insights that were not possible in previous works due to their choice of metrics. We perform this study on two pre-trained models, Llama and OPT, trained on the Common Crawl and The Pile respectively. We discover that 1) Extraction Rate, the predominant metric used in prior quantification work, underestimates the threat of leakage of training data in randomized LLMs by as much as 2.14X. 2) Although on average, larger models and longer prefixes can extract more data, this is not true for a substantial portion of individual sequences. 30.4-41.5% of our sequences are easier to extract with either shorter prefixes or smaller models. 3) Contrary to previous beliefs, partial leakage in commonly used decoding schemes like top-k and top-p is not easier than leaking verbatim training data. The aim of this work is to encourage the adoption of this metric for future work on quantification of training data extraction.
LGFeb 21
Prior Aware Memorization: An Efficient Metric for Distinguishing Memorization from Generalization in Large Language ModelsTrishita Tiwari, Ari Trachtenberg, G. Edward Suh
Training data leakage from Large Language Models (LLMs) raises serious concerns related to privacy, security, and copyright compliance. A central challenge in assessing this risk is distinguishing genuine memorization of training data from the generation of statistically common sequences. Existing approaches to measuring memorization often conflate these phenomena, labeling outputs as memorized even when they arise from generalization over common patterns. Counterfactual Memorization provides a principled solution by comparing models trained with and without a target sequence, but its reliance on retraining multiple baseline models makes it computationally expensive and impractical at scale. This work introduces Prior-Aware Memorization, a theoretically grounded, lightweight and training-free criterion for identifying genuine memorization in LLMs. The key idea is to evaluate whether a candidate suffix is strongly associated with its specific training prefix or whether it appears with high probability across many unrelated prompts due to statistical commonality. We evaluate this metric on text from the training corpora of two pre-trained models, LLaMA and OPT, using both long sequences (to simulate copyright risks) and named entities (to simulate PII leakage). Our results show that between 55% and 90% of sequences previously labeled as memorized are in fact statistically common. Similar findings hold for the SATML training data extraction challenge dataset, where roughly 40% of sequences exhibit common-pattern behavior despite appearing only once in the training data. These results demonstrate that low frequency alone is insufficient evidence of memorization and highlight the importance of accounting for model priors when assessing leakage.
CRJan 4, 2019
Page Cache AttacksDaniel Gruss, Erik Kraft, Trishita Tiwari et al.
We present a new hardware-agnostic side-channel attack that targets one of the most fundamental software caches in modern computer systems: the operating system page cache. The page cache is a pure software cache that contains all disk-backed pages, including program binaries, shared libraries, and other files, and our attacks thus work across cores and CPUs. Our side-channel permits unprivileged monitoring of some memory accesses of other processes, with a spatial resolution of 4KB and a temporal resolution of 2 microseconds on Linux (restricted to 6.7 measurements per second) and 466 nanoseconds on Windows (restricted to 223 measurements per second); this is roughly the same order of magnitude as the current state-of-the-art cache attacks. We systematically analyze our side channel by demonstrating different local attacks, including a sandbox bypassing high-speed covert channel, timed user-interface redressing attacks, and an attack recovering automatically generated temporary passwords. We further show that we can trade off the side channel's hardware agnostic property for remote exploitability. We demonstrate this via a low profile remote covert channel that uses this page-cache side-channel to exfiltrate information from a malicious sender process through innocuous server requests. Finally, we propose mitigations for some of our attacks, which have been acknowledged by operating system vendors and slated for future security patches.
CRJul 7, 2018
Nothing But Net: Invading Android User Privacy Using Only Network Access PatternsMikhail Andreev, Avi Klausner, Trishita Tiwari et al.
We evaluate the power of simple networks side-channels to violate user privacy on Android devices. Specifically, we show that, using blackbox network metadata alone (i.e., traffic statistics such as transmission time and size of packets) it is possible to infer several elements of a user's location and also identify their web browsing history (i.e, which sites they visited). We do this with relatively simple learning and classification methods and basic network statistics. For most Android phones currently on the market, such process-level traffic statistics are available for any running process, without any permissions control and at fine-grained details, although, as we demonstrate, even device-level statistics are sufficient for some of our attacks. In effect, it may be possible for any application running on these phones to identify privacy-revealing elements of a user's location, for example, correlating travel with places of worship, point-of-care medical establishments, or political activity.