CRSep 4, 2024
Obsidian: Cooperative State-Space Exploration for Performant Inference on Secure ML AcceleratorsSarbartha Banerjee, Shijia Wei, Prakash Ramrakhyani et al.
Trusted execution environments (TEEs) for machine learning accelerators are indispensable in secure and efficient ML inference. Optimizing workloads through state-space exploration for the accelerator architectures improves performance and energy consumption. However, such explorations are expensive and slow due to the large search space. Current research has to use fast analytical models that forego critical hardware details and cross-layer opportunities unique to the hardware security primitives. While cycle-accurate models can theoretically reach better designs, their high runtime cost restricts them to a smaller state space. We present Obsidian, an optimization framework for finding the optimal mapping from ML kernels to a secure ML accelerator. Obsidian addresses the above challenge by exploring the state space using analytical and cycle-accurate models cooperatively. The two main exploration components include: (1) A secure accelerator analytical model, that includes the effect of secure hardware while traversing the large mapping state space and produce the best m model mappings; (2) A compiler profiling step on a cycle-accurate model, that captures runtime bottlenecks to further improve execution runtime, energy and resource utilization and find the optimal model mapping. We compare our results to a baseline secure accelerator, comprising of the state-of-the-art security schemes obtained from guardnn [ 33 ] and sesame [11]. The analytical model reduces the inference latency by 20.5% for a cloud and 8.4% for an edge deployment with an energy improvement of 24% and 19% respectively. The cycle-accurate model, further reduces the latency by 9.1% for a cloud and 12.2% for an edge with an energy improvement of 13.8% and 13.1%.
CROct 14, 2021
Bandwidth Utilization Side-Channel on ML Inference AcceleratorsSarbartha Banerjee, Shijia Wei, Prakash Ramrakhyani et al.
Accelerators used for machine learning (ML) inference provide great performance benefits over CPUs. Securing confidential model in inference against off-chip side-channel attacks is critical in harnessing the performance advantage in practice. Data and memory address encryption has been recently proposed to defend against off-chip attacks. In this paper, we demonstrate that bandwidth utilization on the interface between accelerators and the weight storage can serve a side-channel for leaking confidential ML model architecture. This side channel is independent of the type of interface, leaks even in the presence of data and memory address encryption and can be monitored through performance counters or through bus contention from an on-chip unprivileged process.
CRJul 14, 2020
SESAME: Software defined Enclaves to Secure Inference Accelerators with Multi-tenant ExecutionSarbartha Banerjee, Prakash Ramrakhyani, Shijia Wei et al.
Hardware-enclaves that target complex CPU designs compromise both security and performance. Programs have little control over micro-architecture, which leads to side-channel leaks, and then have to be transformed to have worst-case control- and data-flow behaviors and thus incur considerable slowdown. We propose to address these security and performance problems by bringing enclaves into the realm of accelerator-rich architectures. The key idea is to construct software-defined enclaves (SDEs) where the protections and slowdown are tied to an application-defined threat model and tuned by a compiler for the accelerator's specific domain. This vertically integrated approach requires new hardware data-structures to partition, clear, and shape the utilization of hardware resources; and a compiler that instantiates and schedules these data-structures to create multi-tenant enclaves on accelerators. We demonstrate our ideas with a comprehensive prototype -- Sesame -- that includes modifications to compiler, ISA, and microarchitecture to a decoupled access execute (DAE) accelerator framework for deep learning models. Our security evaluation shows that classifiers that could distinguish different layers in VGG, ResNet, and AlexNet, fail to do so when run using Sesame. Our synthesizable hardware prototype (on a Xilinx Pynq board) demonstrates how the compiler and micro-architecture enables threat-model-specific trade-offs in code size increase ranging from 3-7 $\%$ and run-time performance overhead for specific defenses ranging from 3.96$\%$ to 34.87$\%$ (across confidential inputs and models and single vs. multi-tenant systems).
CRMay 26, 2019
Shredder: Learning Noise Distributions to Protect Inference PrivacyFatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ramrakhyani et al.
A wide variety of deep neural applications increasingly rely on the cloud to perform their compute-heavy inference. This common practice requires sending private and privileged data over the network to remote servers, exposing it to the service provider and potentially compromising its privacy. Even if the provider is trusted, the data can still be vulnerable over communication channels or via side-channel attacks in the cloud. To that end, this paper aims to reduce the information content of the communicated data with as little as possible compromise on the inference accuracy by making the sent data noisy. An undisciplined addition of noise can significantly reduce the accuracy of inference, rendering the service unusable. To address this challenge, this paper devises Shredder, an end-to-end framework, that, without altering the topology or the weights of a pre-trained network, learns additive noise distributions that significantly reduce the information content of communicated data while maintaining the inference accuracy. The key idea is finding the additive noise distributions by casting it as a disjoint offline learning process with a loss function that strikes a balance between accuracy and information degradation. The loss function also exposes a knob for a disciplined and controlled asymmetric trade-off between privacy and accuracy. Experimentation with six real-world DNNs from text processing and image classification shows that Shredder reduces the mutual information between the input and the communicated data to the cloud by 74.70% compared to the original execution while only sacrificing 1.58% loss in accuracy. On average, Shredder also offers a speedup of 1.79x over Wi-Fi and 2.17x over LTE compared to cloud-only execution when using an off-the-shelf mobile GPU (Tegra X2) on the edge.