Ari Trachtenberg

CR
8papers
114citations
Novelty56%
AI Score41

8 Papers

LGFeb 21
Prior Aware Memorization: An Efficient Metric for Distinguishing Memorization from Generalization in Large Language Models

Trishita Tiwari, Ari Trachtenberg, G. Edward Suh

Training data leakage from Large Language Models (LLMs) raises serious concerns related to privacy, security, and copyright compliance. A central challenge in assessing this risk is distinguishing genuine memorization of training data from the generation of statistically common sequences. Existing approaches to measuring memorization often conflate these phenomena, labeling outputs as memorized even when they arise from generalization over common patterns. Counterfactual Memorization provides a principled solution by comparing models trained with and without a target sequence, but its reliance on retraining multiple baseline models makes it computationally expensive and impractical at scale. This work introduces Prior-Aware Memorization, a theoretically grounded, lightweight and training-free criterion for identifying genuine memorization in LLMs. The key idea is to evaluate whether a candidate suffix is strongly associated with its specific training prefix or whether it appears with high probability across many unrelated prompts due to statistical commonality. We evaluate this metric on text from the training corpora of two pre-trained models, LLaMA and OPT, using both long sequences (to simulate copyright risks) and named entities (to simulate PII leakage). Our results show that between 55% and 90% of sequences previously labeled as memorized are in fact statistically common. Similar findings hold for the SATML training data extraction challenge dataset, where roughly 40% of sequences exhibit common-pattern behavior despite appearing only once in the training data. These results demonstrate that low frequency alone is insufficient evidence of memorization and highlight the importance of accounting for model priors when assessing leakage.

CYMar 30, 2020
Anonymous Collocation Discovery: Harnessing Privacy to Tame the Coronavirus

Ran Canetti, Ari Trachtenberg, Mayank Varia

Successful containment of the Coronavirus pandemic rests on the ability to quickly and reliably identify those who have been in close proximity to a contagious individual. Existing tools for doing so rely on the collection of exact location information of individuals over lengthy time periods, and combining this information with other personal information. This unprecedented encroachment on individual privacy at national scales has created an outcry and risks rejection of these tools. We propose an alternative: an extremely simple scheme for providing fine-grained and timely alerts to users who have been in the close vicinity of an infected individual. Crucially, this is done while preserving the anonymity of all individuals, and without collecting or storing any personal information or location history. Our approach is based on using short-range communication mechanisms, like Bluetooth, that are available in all modern cell phones. It can be deployed with very little infrastructure, and incurs a relatively low false-positive rate compared to other collocation methods. We also describe a number of extensions and tradeoffs. We believe that the privacy guarantees provided by the scheme will encourage quick and broad voluntary adoption. When combined with sufficient testing capacity and existing best practices from healthcare professionals, we hope that this may significantly reduce the infection rate.

CRDec 23, 2019
Characterizing Orphan Transactions in the Bitcoin Network

Muhammad Anas Imtiaz, David Starobinski, Ari Trachtenberg

Orphan transactions are those whose parental income-sources are missing at the time that they are processed. These transactions are not propagated to other nodes until all of their missing parents are received, and they thus end up languishing in a local buffer until evicted or their parents are found. Although there has been little work in the literature on characterizing the nature and impact of such orphans, it is intuitive that they may affect throughput on the Bitcoin network. This work thus seeks to methodically research such effects through a measurement campaign of orphan transactions on live Bitcoin nodes. Our data show that, surprisingly, orphan transactions tend to have fewer parents on average than non-orphan transactions. Moreover, the salient features of their missing parents are a lower fee and larger size than their non-orphan counterparts, resulting in a lower transaction fee per byte. Finally, we note that the network overhead incurred by these orphan transactions can be significant, exceeding 17% when using the default orphan memory pool size (100 transactions). However, this overhead can be made negligible, without significant computational or memory demands, if the pool size is merely increased to 1000 transactions.

CRJan 10, 2019
Collaborative Privacy for Web Applications

Yihao Hu, Ari Trachtenberg, Prakash Ishwar

Real-time, online-editing web apps provide free and convenient services for collaboratively editing, sharing and storing files. The benefits of these web applications do not come for free: not only do service providers have full access to the users' files, but they also control access, transmission, and storage mechanisms for them. As a result, user data may be at risk of data mining, third-party interception, or even manipulation. To combat this, we propose a new system for helping to preserve the privacy of user data within collaborative environments. There are several distinct challenges in producing such a system, including developing an encryption mechanism that does not interfere with the back-end (and often proprietary) control mechanisms utilized by the service, and identifying transparent code hooks through which to obfuscate user data. Toward the first challenge, we develop a character-level encryption scheme that is more resilient to the types of attacks that plague classical substitution ciphers. For the second challenge, we design a browser extension that robustly demonstrates the feasibility of our approach, and show a concrete implementation for Google Chrome and the widely-used Google Docs platform. Our example tangibly demonstrates how several users with a shared key can collaboratively and transparently edit a Google Docs document without revealing the plaintext directly to Google.

CRJan 4, 2019
Page Cache Attacks

Daniel Gruss, Erik Kraft, Trishita Tiwari et al.

We present a new hardware-agnostic side-channel attack that targets one of the most fundamental software caches in modern computer systems: the operating system page cache. The page cache is a pure software cache that contains all disk-backed pages, including program binaries, shared libraries, and other files, and our attacks thus work across cores and CPUs. Our side-channel permits unprivileged monitoring of some memory accesses of other processes, with a spatial resolution of 4KB and a temporal resolution of 2 microseconds on Linux (restricted to 6.7 measurements per second) and 466 nanoseconds on Windows (restricted to 223 measurements per second); this is roughly the same order of magnitude as the current state-of-the-art cache attacks. We systematically analyze our side channel by demonstrating different local attacks, including a sandbox bypassing high-speed covert channel, timed user-interface redressing attacks, and an attack recovering automatically generated temporary passwords. We further show that we can trade off the side channel's hardware agnostic property for remote exploitability. We demonstrate this via a low profile remote covert channel that uses this page-cache side-channel to exfiltrate information from a malicious sender process through innocuous server requests. Finally, we propose mitigations for some of our attacks, which have been acknowledged by operating system vendors and slated for future security patches.

CRJul 7, 2018
Nothing But Net: Invading Android User Privacy Using Only Network Access Patterns

Mikhail Andreev, Avi Klausner, Trishita Tiwari et al.

We evaluate the power of simple networks side-channels to violate user privacy on Android devices. Specifically, we show that, using blackbox network metadata alone (i.e., traffic statistics such as transmission time and size of packets) it is possible to infer several elements of a user's location and also identify their web browsing history (i.e, which sites they visited). We do this with relatively simple learning and classification methods and basic network statistics. For most Android phones currently on the market, such process-level traffic statistics are available for any running process, without any permissions control and at fine-grained details, although, as we demonstrate, even device-level statistics are sufficient for some of our attacks. In effect, it may be possible for any application running on these phones to identify privacy-revealing elements of a user's location, for example, correlating travel with places of worship, point-of-care medical establishments, or political activity.

CRMar 17, 2018
Improving Bitcoin's Resilience to Churn

Nabeel Younis, Muhammad Anas Imtiaz, David Starobinski et al.

Efficient and reliable block propagation on the Bitcoin network is vital for ensuring the scalability of this peer-to-peer network. To this end, several schemes have been proposed over the last few years to speed up the block propagation, most notably the compact block protocol (BIP 152). Despite this, we show experimental evidence that nodes that have recently joined the network may need about ten days until this protocol becomes 90% effective. This problem is endemic for nodes that do not have persistent network connectivity. We propose to mitigate this ineffectiveness by maintaining mempool synchronization among Bitcoin nodes. For this purpose, we design and implement into Bitcoin a new prioritized data synchronization protocol, called FalafelSync. Our experiments show that FalafelSync helps intermittently connected nodes to maintain better consistency with more stable nodes, thereby showing promise for improving block propagation in the broader network. In the process, we have also developed an effective logging mechanism for bitcoin nodes we release for public use.

CRJan 29, 2014
Securing Smartphones: A Micro-TCB Approach

Yossi Gilad, Amir Herzberg, Ari Trachtenberg

As mobile phones have evolved into `smartphones', with complex operating systems running third- party software, they have become increasingly vulnerable to malicious applications (malware). We introduce a new design for mitigating malware attacks against smartphone users, based on a small trusted computing base module, denoted uTCB. The uTCB manages sensitive data and sensors, and provides core services to applications, independently of the operating system. The user invokes uTCB using a simple secure attention key, which is pressed in order to validate physical possession of the device and authorize a sensitive action; this protects private information even if the device is infected with malware. We present a proof-of-concept implementation of uTCB based on ARM's TrustZone, a secure execution environment increasingly found in smartphones, and evaluate our implementation using simulations.