CRMay 2, 2022
Using Constraint Programming and Graph Representation Learning for Generating Interpretable Cloud Security PoliciesMikhail Kazdagli, Mohit Tiwari, Akshat Kumar
Modern software systems rely on mining insights from business sensitive data stored in public clouds. A data breach usually incurs significant (monetary) loss for a commercial organization. Conceptually, cloud security heavily relies on Identity Access Management (IAM) policies that IT admins need to properly configure and periodically update. Security negligence and human errors often lead to misconfiguring IAM policies which may open a backdoor for attackers. To address these challenges, first, we develop a novel framework that encodes generating optimal IAM policies using constraint programming (CP). We identify reducing dark permissions of cloud users as an optimality criterion, which intuitively implies minimizing unnecessary datastore access permissions. Second, to make IAM policies interpretable, we use graph representation learning applied to historical access patterns of users to augment our CP model with similarity constraints: similar users should be grouped together and share common IAM policies. Third, we describe multiple attack models and show that our optimized IAM policies significantly reduce the impact of security attacks using real data from 8 commercial organizations, and synthetic instances.
CRAug 9, 2024
ConfusedPilot: Confused Deputy Risks in RAG-based LLMsAyush RoyChowdhury, Mulong Luo, Prateek Sahu et al.
Retrieval augmented generation (RAG) is a process where a large language model (LLM) retrieves useful information from a database and then generates the responses. It is becoming popular in enterprise settings for daily business operations. For example, Copilot for Microsoft 365 has accumulated millions of businesses. However, the security implications of adopting such RAG-based systems are unclear. In this paper, we introduce ConfusedPilot, a class of security vulnerabilities of RAG systems that confuse Copilot and cause integrity and confidentiality violations in its responses. First, we investigate a vulnerability that embeds malicious text in the modified prompt in RAG, corrupting the responses generated by the LLM. Second, we demonstrate a vulnerability that leaks secret data, which leverages the caching mechanism during retrieval. Third, we investigate how both vulnerabilities can be exploited to propagate misinformation within the enterprise and ultimately impact its operations, such as sales and manufacturing. We also discuss the root cause of these attacks by investigating the architecture of a RAG-based system. This study highlights the security vulnerabilities in today's RAG-based systems and proposes design guidelines to secure future RAG-based systems.
CRSep 4, 2024
Obsidian: Cooperative State-Space Exploration for Performant Inference on Secure ML AcceleratorsSarbartha Banerjee, Shijia Wei, Prakash Ramrakhyani et al.
Trusted execution environments (TEEs) for machine learning accelerators are indispensable in secure and efficient ML inference. Optimizing workloads through state-space exploration for the accelerator architectures improves performance and energy consumption. However, such explorations are expensive and slow due to the large search space. Current research has to use fast analytical models that forego critical hardware details and cross-layer opportunities unique to the hardware security primitives. While cycle-accurate models can theoretically reach better designs, their high runtime cost restricts them to a smaller state space. We present Obsidian, an optimization framework for finding the optimal mapping from ML kernels to a secure ML accelerator. Obsidian addresses the above challenge by exploring the state space using analytical and cycle-accurate models cooperatively. The two main exploration components include: (1) A secure accelerator analytical model, that includes the effect of secure hardware while traversing the large mapping state space and produce the best m model mappings; (2) A compiler profiling step on a cycle-accurate model, that captures runtime bottlenecks to further improve execution runtime, energy and resource utilization and find the optimal model mapping. We compare our results to a baseline secure accelerator, comprising of the state-of-the-art security schemes obtained from guardnn [ 33 ] and sesame [11]. The analytical model reduces the inference latency by 20.5% for a cloud and 8.4% for an edge deployment with an energy improvement of 24% and 19% respectively. The cycle-accurate model, further reduces the latency by 9.1% for a cloud and 12.2% for an edge with an energy improvement of 13.8% and 13.1%.
CRMar 12
Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI SystemsSarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner et al.
Rapid progress in generative AI has given rise to Compound AI systems - pipelines comprised of multiple large language models (LLM), software tools and database systems. Compound AI systems are constructed on a layered traditional software stack running on a distributed hardware infrastructure. Many of the diverse software components are vulnerable to traditional security flaws documented in the Common Vulnerabilities and Exposures (CVE) database, while the underlying distributed hardware infrastructure remains exposed to timing attacks, bit-flip faults, and power-based side channels. Today, research targets LLM-specific risks like model extraction, training data leakage, and unsafe generation -- overlooking the impact of traditional system vulnerabilities. This work investigates how traditional software and hardware vulnerabilities can complement LLM-specific algorithmic attacks to compromise the integrity of a compound AI pipeline. We demonstrate two novel attacks that combine system-level vulnerabilities with algorithmic weaknesses: (1) Exploiting a software code injection flaw along with a guardrail Rowhammer attack to inject an unaltered jailbreak prompt into an LLM, resulting in an AI safety violation, and (2) Manipulating a knowledge database to redirect an LLM agent to transmit sensitive user data to a malicious application, thus breaching confidentiality. These attacks highlight the need to address traditional vulnerabilities; we systematize the attack primitives and analyze their composition by grouping vulnerabilities by their objective and mapping them to distinct stages of an attack lifecycle. This approach enables a rigorous red-teaming exercise and lays the groundwork for future defense strategies.
CVDec 1, 2020Code
Emotion Detection using Image Processing in PythonRaghav Puri, Archit Gupta, Manas Sikri et al.
In this work, user's emotion using its facial expressions will be detected. These expressions can be derived from the live feed via system's camera or any pre-exisiting image available in the memory. Emotions possessed by humans can be recognized and has a vast scope of study in the computer vision industry upon which several researches have already been done. The work has been implemented using Python (2.7, Open Source Computer Vision Library (OpenCV) and NumPy. The scanned image(testing dataset) is being compared to the training dataset and thus emotion is predicted. The objective of this paper is to develop a system which can analyze the image and predict the expression of the person. The study proves that this procedure is workable and produces valid results.
CRNov 20, 2024
SoK: A Systems Perspective on Compound AI Threats and CountermeasuresSarbartha Banerjee, Prateek Sahu, Mulong Luo et al.
Large language models (LLMs) used across enterprises often use proprietary models and operate on sensitive inputs and data. The wide range of attack vectors identified in prior research - targeting various software and hardware components used in training and inference - makes it extremely challenging to enforce confidentiality and integrity policies. As we advance towards constructing compound AI inference pipelines that integrate multiple large language models (LLMs), the attack surfaces expand significantly. Attackers now focus on the AI algorithms as well as the software and hardware components associated with these systems. While current research often examines these elements in isolation, we find that combining cross-layer attack observations can enable powerful end-to-end attacks with minimal assumptions about the threat model. Given, the sheer number of existing attacks at each layer, we need a holistic and systemized understanding of different attack vectors at each layer. This SoK discusses different software and hardware attacks applicable to compound AI systems and demonstrates how combining multiple attack mechanisms can reduce the threat model assumptions required for an isolated attack. Next, we systematize the ML attacks in lines with the Mitre Att&ck framework to better position each attack based on the threat model. Finally, we outline the existing countermeasures for both software and hardware layers and discuss the necessity of a comprehensive defense strategy to enable the secure and high-performance deployment of compound AI systems.
CRFeb 24, 2025
Towards Reinforcement Learning for Exploration of Speculative Execution VulnerabilitiesEvan Lai, Wenjie Xiong, Edward Suh et al.
Speculative attacks such as Spectre can leak secret information without being discovered by the operating system. Speculative execution vulnerabilities are finicky and deep in the sense that to exploit them, it requires intensive manual labor and intimate knowledge of the hardware. In this paper, we introduce SpecRL, a framework that utilizes reinforcement learning to find speculative execution leaks in post-silicon (black box) microprocessors.
CRFeb 16, 2024
CloudLens: Modeling and Detecting Cloud Security VulnerabilitiesMikhail Kazdagli, Mohit Tiwari, Akshat Kumar
Cloud computing services provide scalable and cost-effective solutions for data storage, processing, and collaboration. With their growing popularity, concerns about security vulnerabilities are increasing. To address this, first, we provide a formal model, called CloudLens, that expresses relations between different cloud objects such as users, datastores, security roles, representing access control policies in cloud systems. Second, as access control misconfigurations are often the primary driver for cloud attacks, we develop a planning model for detecting security vulnerabilities. Such vulnerabilities can lead to widespread attacks such as ransomware, sensitive data exfiltration among others. A planner generates attacks to identify such vulnerabilities in the cloud. Finally, we test our approach on 14 real Amazon AWS cloud configurations of different commercial organizations. Our system can identify a broad range of security vulnerabilities, which state-of-the-art industry tools cannot detect.
AIOct 26, 2021
NeuroBack: Improving CDCL SAT Solving using Graph Neural NetworksWenxi Wang, Yang Hu, Mohit Tiwari et al.
Propositional satisfiability (SAT) is an NP-complete problem that impacts many research fields, such as planning, verification, and security. Mainstream modern SAT solvers are based on the Conflict-Driven Clause Learning (CDCL) algorithm. Recent work aimed to enhance CDCL SAT solvers using Graph Neural Networks (GNNs). However, so far this approach either has not made solving more effective, or required substantial GPU resources for frequent online model inferences. Aiming to make GNN improvements practical, this paper proposes an approach called NeuroBack, which builds on two insights: (1) predicting phases (i.e., values) of variables appearing in the majority (or even all) of the satisfying assignments are essential for CDCL SAT solving, and (2) it is sufficient to query the neural model only once for the predictions before the SAT solving starts. Once trained, the offline model inference allows NeuroBack to execute exclusively on the CPU, removing its reliance on GPU resources. To train NeuroBack, a new dataset called DataBack containing 120,286 data samples is created. NeuroBack is implemented as an enhancement to a state-of-the-art SAT solver called Kissat. As a result, it allowed Kissat to solve up to 5.2% and 7.4% more problems on two recent SAT competition problem sets, SATCOMP-2022 and SATCOMP-2023, respectively. NeuroBack therefore shows how machine learning can be harnessed to improve SAT solving in an effective and practical manner.
CROct 14, 2021
Bandwidth Utilization Side-Channel on ML Inference AcceleratorsSarbartha Banerjee, Shijia Wei, Prakash Ramrakhyani et al.
Accelerators used for machine learning (ML) inference provide great performance benefits over CPUs. Securing confidential model in inference against off-chip side-channel attacks is critical in harnessing the performance advantage in practice. Data and memory address encryption has been recently proposed to defend against off-chip attacks. In this paper, we demonstrate that bandwidth utilization on the interface between accelerators and the weight storage can serve a side-channel for leaking confidential ML model architecture. This side channel is independent of the type of interface, leaks even in the presence of data and memory address encryption and can be monitored through performance counters or through bus contention from an on-chip unprivileged process.
CRAug 28, 2021
Power-Based Attacks on Spatial DNN AcceleratorsGe Li, Mohit Tiwari, Michael Orshansky
With proliferation of DNN-based applications, the confidentiality of DNN model is an important commercial goal. Spatial accelerators, that parallelize matrix/vector operations, are utilized for enhancing energy efficiency of DNN computation. Recently, model extraction attacks on simple accelerators, either with a single processing element or running a binarized network, were demonstrated using the methodology derived from differential power analysis (DPA) attack on cryptographic devices. This paper investigates the vulnerability of realistic spatial accelerators using general, 8-bit, number representation. We investigate two systolic array architectures with weight-stationary dataflow: (1) a 3 $\times$ 1 array for a dot-product operation, and (2) a 3 $\times$ 3 array for matrix-vector multiplication. Both are implemented on the SAKURA-G FPGA board. We show that both architectures are ultimately vulnerable. A conventional DPA succeeds fully on the 1D array, requiring 20K power measurements. However, the 2D array exhibits higher security even with 460K traces. We show that this is because the 2D array intrinsically entails multiple MACs simultaneously dependent on the same input. However, we find that a novel template-based DPA with multiple profiling phases is able to fully break the 2D array with only 40K traces. Corresponding countermeasures need to be investigated for spatial DNN accelerators.
CRJul 14, 2020
SESAME: Software defined Enclaves to Secure Inference Accelerators with Multi-tenant ExecutionSarbartha Banerjee, Prakash Ramrakhyani, Shijia Wei et al.
Hardware-enclaves that target complex CPU designs compromise both security and performance. Programs have little control over micro-architecture, which leads to side-channel leaks, and then have to be transformed to have worst-case control- and data-flow behaviors and thus incur considerable slowdown. We propose to address these security and performance problems by bringing enclaves into the realm of accelerator-rich architectures. The key idea is to construct software-defined enclaves (SDEs) where the protections and slowdown are tied to an application-defined threat model and tuned by a compiler for the accelerator's specific domain. This vertically integrated approach requires new hardware data-structures to partition, clear, and shape the utilization of hardware resources; and a compiler that instantiates and schedules these data-structures to create multi-tenant enclaves on accelerators. We demonstrate our ideas with a comprehensive prototype -- Sesame -- that includes modifications to compiler, ISA, and microarchitecture to a decoupled access execute (DAE) accelerator framework for deep learning models. Our security evaluation shows that classifiers that could distinguish different layers in VGG, ResNet, and AlexNet, fail to do so when run using Sesame. Our synthesizable hardware prototype (on a Xilinx Pynq board) demonstrates how the compiler and micro-architecture enables threat-model-specific trade-offs in code size increase ranging from 3-7 $\%$ and run-time performance overhead for specific defenses ranging from 3.96$\%$ to 34.87$\%$ (across confidential inputs and models and single vs. multi-tenant systems).
CRMar 1, 2018
The Shape of Alerts: Detecting Malware Using Distributed Detectors by Robustly Amplifying Transient CorrelationsMikhail Kazdagli, Constantine Caramanis, Sanjay Shakkottai et al.
We introduce a new malware detector - Shape-GD - that aggregates per-machine detectors into a robust global detector. Shape-GD is based on two insights: 1. Structural: actions such as visiting a website (waterhole attack) by nodes correlate well with malware spread, and create dynamic neighborhoods of nodes that were exposed to the same attack vector. However, neighborhood sizes vary unpredictably and require aggregating an unpredictable number of local detectors' outputs into a global alert. 2. Statistical: feature vectors corresponding to true and false positives of local detectors have markedly different conditional distributions - i.e. their shapes differ. The shape of neighborhoods can identify infected neighborhoods without having to estimate neighborhood sizes - on 5 years of Symantec detectors' logs, Shape-GD reduces false positives from ~1M down to ~110K and raises alerts 345 days (on average) before commercial anti-virus products; in a waterhole attack simulated using Yahoo web-service logs, Shape-GD detects infected machines when only ~100 of ~550K are compromised.
CRAug 6, 2017
Exploiting Latent Attack Semantics for Intelligent Malware DetectionMkhail Kazdagli, Constantine Caramanis, Sanjay Shakkottai et al.
Behavioral malware detectors promise to expose previously unknown malware and are an important security primitive. However, even the best behavioral detectors suffer from high false positives and negatives. In this paper, we address the challenge of aggregating weak per-device behavioral detectors in noisy communities (i.e., ones that produce alerts at unpredictable rates) into an accurate and robust global anomaly detector (GD). Our system - Shape GD - combines two insights: Structural: actions such as visiting a website (waterhole attack) or membership in a shared email thread (phishing attack) by nodes correlate well with malware spread, and create dynamic neighborhoods of nodes that were exposed to the same attack vector; and Statistical: feature vectors corresponding to true and false positives of local detectors have markedly different conditional distributions. We use neighborhoods to amplify the transient low-dimensional structure that is latent in high-dimensional feature vectors - but neighborhoods vary unpredictably, and we use shape to extract robust neighborhood-level features that identify infected neighborhoods. Unlike prior works that aggregate local detectors' alert bitstreams or cluster the feature vectors, Shape GD analyzes the feature vectors that led to local alerts (alert-FVs) to separate true and false positives. Shape GD first filters these alert-FVs into neighborhoods and efficiently maps a neighborhood's alert-FVs' statistical shapes into a scalar score. Shape GD then acts as a neighborhood level anomaly detector - training on benign program traces to learn the ShapeScore of false positive neighborhoods, and classifying neighborhoods with anomalous ShapeScores as malicious. Shape GD detects malware early (~100 infected nodes in a ~100K node system for waterhole and ~10 of 1000 for phishing) and robustly (with ~100% global TP and ~1% global FP rates).
CRMar 9, 2016
EMMA: A New Platform to Evaluate Hardware-based Mobile Malware AnalysesMikhail Kazdagli, Ling Huang, Vijay Reddi et al.
Hardware-based malware detectors (HMDs) are a key emerging technology to build trustworthy computing platforms, especially mobile platforms. Quantifying the efficacy of HMDs against malicious adversaries is thus an important problem. The challenge lies in that real-world malware typically adapts to defenses, evades being run in experimental settings, and hides behind benign applications. Thus, realizing the potential of HMDs as a line of defense - that has a small and battery-efficient code base - requires a rigorous foundation for evaluating HMDs. To this end, we introduce EMMA - a platform to evaluate the efficacy of HMDs for mobile platforms. EMMA deconstructs malware into atomic, orthogonal actions and introduces a systematic way of pitting different HMDs against a diverse subset of malware hidden inside benign applications. EMMA drives both malware and benign programs with real user-inputs to yield an HMD's effective operating range - i.e., the malware actions a particular HMD is capable of detecting. We show that small atomic actions, such as stealing a Contact or SMS, have surprisingly large hardware footprints, and use this insight to design HMD algorithms that are less intrusive than prior work and yet perform 24.7% better. Finally, EMMA brings up a surprising new result - obfuscation techniques used by malware to evade static analyses makes them more detectable using HMDs.