40.3CRMay 14
Big Bird: Resilient Privacy Budgeting Across Untrusted Web DomainsPierre Tholoniat, Alison Caulfield, Giorgio Cavicchioli et al.
The W3C Attribution API is an emerging standard for privacy-preserving advertising measurement. Its current privacy architecture enforces individual differential privacy (IDP) independently for each domain (e.g., an advertiser) issuing queries. We show that this guarantee is unsound under realistic system behavior: it fails under cross-querier data adaptivity and can also fail when shared limits are enforced across queriers. The issue is not the on-device accounting model itself -- device-epoch IDP -- but treating each querying domain in isolation. We propose Big Bird, a privacy-budget manager that makes global device-epoch IDP -- enforced jointly across all domains -- both sound and deployable for Attribution. Big Bird addresses the main obstacle to global enforcement in open multi-querier systems: denial-of-service depletion of a shared global budget by Sybil web domains. Its key insight is that benign Attribution workloads have a stock-and-flow structure: impressions create potential privacy loss, conversions realize it, and meaningful budget consumption should be tied to genuine user actions across distinct web domains. Big Bird enforces this structure with privacy-loss-based quotas on impression and conversion sites and a per-user-action cap on how many quotas can be activated, ensuring that adversarial impact scales with genuine user interactions rather than with the number of Sybil domains. We implement Big Bird in Rust, integrate it into Firefox's Attribution prototype, and evaluate it theoretically and empirically on real ad-tech data. We show that Big Bird provides rigorous global device-epoch IDP, formal resilience to depletion attacks, and utility for benign queriers under attack.
CRDec 26, 2022
DPack: Efficiency-Oriented Privacy Budget SchedulingPierre Tholoniat, Kelly Kostopoulou, Mosharaf Chowdhury et al.
Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPack, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPack: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPack, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.
CRApr 22, 2022
A Tale of Two Models: Constructing Evasive Attacks on Edge ModelsWei Hao, Aahil Awatramani, Jiayang Hu et al.
Full-precision deep learning models are typically too large or costly to deploy on edge devices. To accommodate to the limited hardware resources, models are adapted to the edge using various edge-adaptation techniques, such as quantization and pruning. While such techniques may have a negligible impact on top-line accuracy, the adapted models exhibit subtle differences in output compared to the original model from which they are derived. In this paper, we introduce a new evasive attack, DIVA, that exploits these differences in edge adaptation, by adding adversarial noise to input data that maximizes the output difference between the original and adapted model. Such an attack is particularly dangerous, because the malicious input will trick the adapted model running on the edge, but will be virtually undetectable by the original model, which typically serves as the authoritative model version, used for validation, debugging and retraining. We compare DIVA to a state-of-the-art attack, PGD, and show that DIVA is only 1.7-3.6% worse on attacking the adapted model but 1.9-4.2 times more likely not to be detected by the the original model under a whitebox and semi-blackbox setting, compared to PGD.
74.0OSApr 14
TierBPF: Page Migration Admission Control for Tiered Memory via eBPFXi Wang, Tal Zussman, Yuang Xu et al.
Existing software-based memory tiering systems decide which pages to place on the slower or faster tier. However, they do not take into account two important factors that greatly influence application performance: the size of the migrated pages, and the underlying hardware device and tiering topology. We introduce TierBPF, a software mechanism that can be plugged into existing memory tiering systems to take these factors into account, by making simple binary page admission decisions. TierBPF is implemented as a set of eBPF hooks, which allow users to define their own custom policies. In order to make its decisions, TierBPF utilizes a lightweight tracking mechanism for page profiling which is not dependent on the application's working set size. TierBPF, integrated into three memory tiering systems and evaluated with 17 workloads, achieves geomean throughput gains of up to 17.7% with improvements of up to 75% for individual workloads.
CRDec 3, 2019Code
Cost-Aware Robust Tree Ensembles for Security ApplicationsYizheng Chen, Shiqi Wang, Weifan Jiang et al.
There are various costs for attackers to manipulate the features of security classifiers. The costs are asymmetric across features and to the directions of changes, which cannot be precisely captured by existing cost models based on $L_p$-norm robustness. In this paper, we utilize such domain knowledge to increase the attack cost of evading classifiers, specifically, tree ensemble models that are widely used by security tasks. We propose a new cost modeling method to capture the feature manipulation cost as constraint, and then we integrate the cost-driven constraint into the node construction process to train robust tree ensembles. During the training process, we use the constraint to find data points that are likely to be perturbed given the feature manipulation cost, and we use a new robust training algorithm to optimize the quality of the trees. Our cost-aware training method can be applied to different types of tree ensembles, including gradient boosted decision trees and random forest models. Using Twitter spam detection as the case study, our evaluation results show that we can increase the attack cost by 10.6X compared to the baseline. Moreover, our robust training method using cost-driven constraint can achieve higher accuracy, lower false positive rate, and stronger cost-aware robustness than the state-of-the-art training method using $L_\infty$-norm cost model. Our code is available at https://github.com/surrealyz/growtrees.
PFMar 18, 2025
Fake Runs, Real Fixes -- Analyzing xPU Performance Through SimulationIoannis Zarkadas, Amanda Tomlinson, Asaf Cidon et al.
As models become larger, ML accelerators are a scarce resource whose performance must be continually optimized to improve efficiency. Existing performance analysis tools are coarse grained, and fail to capture model performance at the machine-code level. In addition, these tools often do not provide specific recommendations for optimizations. We present xPU-Shark, a fine-grained methodology for analyzing ML models at the machine-code level that provides actionable optimization suggestions. Our core insight is to use a hardware-level simulator, an artifact of the hardware design process that we can re-purpose for performance analysis. xPU-Shark captures traces from production deployments running on accelerators and replays them in a modified microarchitecture simulator to gain low-level insights into the model's performance. We implement xPU-Shark for our in-house accelerator and used it to analyze the performance of several of our production LLMs, revealing several previously-unknown microarchitecture inefficiencies. Leveraging these insights, we optimize a common communication collective by up to 15% and reduce token generation latency by up to 4.1%.
LGMay 12, 2023
Monitoring and Adapting ML Models on Mobile DevicesWei Hao, Zixi Wang, Lauren Hong et al.
ML models are increasingly being pushed to mobile devices, for low-latency inference and offline operation. However, once the models are deployed, it is hard for ML operators to track their accuracy, which can degrade unpredictably (e.g., due to data drift). We design the first end-to-end system for continuously monitoring and adapting models on mobile devices without requiring feedback from users. Our key observation is that often model degradation is due to a specific root cause, which may affect a large group of devices. Therefore, once the system detects a consistent degradation across a large number of devices, it employs a root cause analysis to determine the origin of the problem and applies a cause-specific adaptation. We evaluate the system on two computer vision datasets, and show it consistently boosts accuracy compared to existing approaches. On a dataset containing photos collected from driving cars, our system improves the accuracy on average by 15%.
DCJan 6, 2022
Treehouse: A Case For Carbon-Aware Datacenter SoftwareThomas Anderson, Adam Belay, Mosharaf Chowdhury et al.
The end of Dennard scaling and the slowing of Moore's Law has put the energy use of datacenters on an unsustainable path. Datacenters are already a significant fraction of worldwide electricity use, with application demand scaling at a rapid rate. We argue that substantial reductions in the carbon intensity of datacenter computing are possible with a software-centric approach: by making energy and carbon visible to application developers on a fine-grained basis, by modifying system APIs to make it possible to make informed trade offs between performance and carbon emissions, and by raising the level of application programming to allow for flexible use of more energy efficient means of compute and storage. We also lay out a research agenda for systems software to reduce the carbon footprint of datacenter computing.
CRJun 29, 2021
Privacy Budget SchedulingTao Luo, Mingen Pan, Pierre Tholoniat et al.
Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models. We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This distinction forces a re-design of the scheduler. We present DPF (Dominant Private Block Fairness) -- a variant of the popular Dominant Resource Fairness (DRF) algorithm -- that is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF. We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. This is especially true for DPF over Rényi DP, a highly composable form of DP.
LGOct 18, 2020
Characterizing and Taming Model Instability Across Edge DevicesEyal Cidon, Evgenya Pergament, Zain Asgar et al.
The same machine learning model running on different edge devices may produce highly-divergent outputs on a nearly-identical input. Possible reasons for the divergence include differences in the device sensors, the device's signal processing hardware and software, and its operating system and processors. This paper presents the first methodical characterization of the variations in model prediction across real-world mobile devices. We demonstrate that accuracy is not a useful metric to characterize prediction divergence, and introduce a new metric, instability, which captures this variation. We characterize different sources for instability, and show that differences in compression formats and image signal processing account for significant instability in object classification models. Notably, in our experiments, 14-17% of images produced divergent classifications across one or more phone models. We evaluate three different techniques for reducing instability. In particular, we adapt prior work on making models robust to noise in order to fine-tune models to be robust to variations across edge devices. We demonstrate our fine-tuning techniques reduce instability by 75%.
CRJul 28, 2020
A Large-Scale Analysis of Attacker Activity in Compromised Enterprise AccountsNeil Shah, Grant Ho, Marco Schweighauser et al.
We present a large-scale characterization of attacker activity across 111 real-world enterprise organizations. We develop a novel forensic technique for distinguishing between attacker activity and benign activity in compromised enterprise accounts that yields few false positives and enables us to perform fine-grained analysis of attacker behavior. Applying our methods to a set of 159 compromised enterprise accounts, we quantify the duration of time attackers are active in accounts and examine thematic patterns in how attackers access and leverage these hijacked accounts. We find that attackers frequently dwell in accounts for multiple days to weeks, suggesting that delayed (non-real-time) detection can still provide significant value. Based on an analysis of the attackers' timing patterns, we observe two distinct modalities in how attackers access compromised accounts, which could be explained by the existence of a specialized market for hijacked enterprise accounts: where one class of attackers focuses on compromising and selling account access to another class of attackers who exploit the access such hijacked accounts provide. Ultimately, our analysis sheds light on the state of enterprise account hijacking and highlights fruitful directions for a broader space of detection methods, ranging from new features that home in on malicious account behavior to the development of non-real-time detection methods that leverage malicious activity after an attack's initial point of compromise to more accurately identify attacks.
CROct 2, 2019
Detecting and Characterizing Lateral Phishing at ScaleGrant Ho, Asaf Cidon, Lior Gavish et al.
We present the first large-scale characterization of lateral phishing attacks, based on a dataset of 113 million employee-sent emails from 92 enterprise organizations. In a lateral phishing attack, adversaries leverage a compromised enterprise account to send phishing emails to other users, benefitting from both the implicit trust and the information in the hijacked user's account. We develop a classifier that finds hundreds of real-world lateral phishing emails, while generating under four false positives per every one-million employee-sent emails. Drawing on the attacks we detect, as well as a corpus of user-reported incidents, we quantify the scale of lateral phishing, identify several thematic content and recipient targeting strategies that attackers follow, illuminate two types of sophisticated behaviors that attackers exhibit, and estimate the success rate of these attacks. Collectively, these results expand our mental models of the 'enterprise attacker' and shed light on the current state of enterprise phishing attacks.
LGNov 14, 2018
Bandana: Using Non-volatile Memory for Storing Deep Learning ModelsAssaf Eisenman, Maxim Naumov, Darryl Gardner et al.
Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache. The main challenge in storing embeddings on NVM is its limited read bandwidth compared to DRAM. Bandana uses two primary techniques to address this limitation: first, it stores embedding vectors that are likely to be read together in the same physical location, using hypergraph partitioning, and second, it decides the number of embedding vectors to cache in DRAM by simulating dozens of small caches. These techniques allow Bandana to increase the effective read bandwidth of NVM by 2-3x and thereby significantly reduce the total cost of ownership.