12.0CRApr 30Code
DRAMatic Speedup: Accelerating HE Operations on a Processing-in-Memory SystemNiklas Klinger, Jonas Sander, Peterson Yuhala et al.
Homomorphic encryption (HE) is a promising technology for confidential cloud computing, as it allows computations on encrypted data. However, HE is computationally expensive and often memory-bound on conventional computer architectures. Processing-in-Memory (PIM) is an alternative hardware architecture that integrates processing units and memory on the same chip or memory module. PIM enables higher memory bandwidth than conventional architectures and could thus be suitable for accelerating HE. We present DRAMatic, which implements operations foundational to HE on UPMEM PIM -- a programmable general-purpose PIM system developed by UPMEM. DRAMatic incorporates many arithmetic optimizations, including residue number system and number-theoretic transform techniques, and can support the large parameters required for secure homomorphic evaluations. It achieves a 334 times speed-up compared to previous HE implementations on UPMEM PIM. We also evaluate DRAMatic against Microsoft SEAL, a popular open-source HE library, regarding both runtime and energy efficiency. The results show that DRAMatic significantly closes the gap between Microsoft SEAL and HE implementations on UPMEM PIM. However, we also show that DRAMatic is currently constrained by data transfer overhead and limited multiplication performance on UPMEM PIM hardware. Finally, we discuss potential hardware extensions to UPMEM PIM.
CRFeb 13, 2023
Dash: Accelerating Distributed Private Convolutional Neural Network Inference with Arithmetic Garbled CircuitsJonas Sander, Sebastian Berndt, Ida Bruhns et al.
The adoption of machine learning solutions is rapidly increasing across all parts of society. As the models grow larger, both training and inference of machine learning models is increasingly outsourced, e.g. to cloud service providers. This means that potentially sensitive data is processed on untrusted platforms, which bears inherent data security and privacy risks. In this work, we investigate how to protect distributed machine learning systems, focusing on deep convolutional neural networks. The most common and best-performing mixed MPC approaches are based on HE, secret sharing, and garbled circuits. They commonly suffer from large performance overheads, big accuracy losses, and communication overheads that grow linearly in the depth of the neural network. To improve on these problems, we present Dash, a fast and distributed private convolutional neural network inference scheme secure against malicious attackers. Building on arithmetic garbling gadgets [BMR16] and fancy-garbling [BCM+19], Dash is based purely on arithmetic garbled circuits. We introduce LabelTensors that allow us to leverage the massive parallelity of modern GPUs. Combined with state-of-the-art garbling optimizations, Dash outperforms previous garbling approaches up to a factor of about 100. Furthermore, we introduce an efficient scaling operation over the residues of the Chinese remainder theorem representation to arithmetic garbled circuits, which allows us to garble larger networks and achieve much higher accuracy than previous approaches. Finally, Dash requires only a single communication round per inference step, regardless of the depth of the neural network, and a very small constant online communication volume.
CRJul 28, 2025
Zebrafix: Mitigating Memory-Centric Side-Channel Leakage via InterleavingAnna Pätschke, Jan Wichelmann, Thomas Eisenbarth
Constant-time code has become the de-facto standard for secure cryptographic implementations. However, some memory-based leakage classes such as ciphertext side-channels and silent stores remain unaddressed. Prior work proposed three different methods for ciphertext side-channel mitigation, for which one, the practicality of interleaving data with counter values, remains to be explored. To close this gap, we define design choices and requirements to leverage interleaving for a generic ciphertext side-channel mitigation. Based on these results, we implement Zebrafix, a compiler-based tool to ensure freshness of memory stores. We evaluate Zebrafix and find that interleaving can perform much better than other ciphertext side-channel mitigations, at the cost of a high practical complexity. We further observe that ciphertext side-channels and silent stores belong to a broader attack category: memory-centric side-channels. Under this unified view, we show that interleaving-based ciphertext side-channel mitigations can be used to prevent silent stores as well.
50.9CRMay 15
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCsDebopriya Roy Dipta, Thore Tiemann, Eduard Marin et al.
Microarchitectural attacks continue to evolve, uncovering new exploitation vectors in modern processors. From a defensive perspective, assessing a system's susceptibility to such attacks remains challenging. Developing functional attack implementations is labor-intensive, requires deep microarchitectural expertise, and is highly sensitive to execution environments. Consequently, existing attacks often lack portability, limiting systematic and scalable vulnerability assessment. Recent advances in large language models (LLMs) suggest a potential avenue for lowering these barriers. However, it remains unclear whether LLMs can reliably generate functionally correct microarchitectural attack code suitable for rigorous vulnerability testing. In this work, we present uGen, the first LLM-driven framework for automated microarchitectural attack code generation. A key challenge we address is identifying attack-specific knowledge gaps in LLMs. Through a systematic study of state-of-the-art models (GPT, Claude, and Qwen3), we find that LLMs frequently misgenerate or misplace critical attack primitives. Guided by this analysis, uGen employs a retrieval-augmented, multi-agent design that injects missing domain knowledge to synthesize functionally correct microarchitectural attack PoCs tailored to defender requirements. We evaluate uGen on cache-based and speculative-execution attacks across diverse set of microarchitectures, vulnerable functions, and LLM platforms. In the deployment stage, uGen achieves up to 100% success rate for Spectre-v1 (Claude Sonnet-4) and 80% for Prime+Probe (Qwen3-Coder). Finally, we demonstrate that uGen can generate a successful PoC code with a cost of $1.25 in under four minutes.
49.7SEMay 13
Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical StudyNils Loose, Joseph Bienhüls, Kristoffer Hempel et al.
Automated detection of vulnerability-fixing commits (VFCs) is critical for timely security patch deployment, as advisory databases lag patch releases by a median of 25 days and many fixes never receive advisories. We present a comprehensive evaluation of code language model based VFC detection through a unified framework consolidating over 20 fragmented datasets spanning more than 180000 commits. Across over 180 experiments with fine-tuned models from 125 M to 14 B parameters, we find no evidence that models acquire transferable security-relevant code understanding from code changes alone. When commit messages are available, they dominate model attention, and when removed, an attribution analysis shows that enriching diffs with additional intra-procedural semantic context does not shift model attention toward the code changes. Group-stratified evaluation exposes approximately 17% performance drops compared to random splits, while temporal splits on aggregated datasets prove unreliable due to compositional shift in the underlying project distributions. At a false positive rate of 0.5% all fine-tuned code-only models miss over 93% of vulnerabilities. Larger and more diverse training data or generative approaches show preliminary improvements but do not resolve the underlying limitations. To support future research on code-centric VFC detection, we release our unified framework and evaluation suite.
29.2CRApr 20
TrEEStealer: Stealing Decision Trees via Enclave Side ChannelsJonas Sander, Anja Rabich, Nick Mahling et al.
Today, machine learning is widely applied in sensitive, security-related, and financially lucrative applications. Model extraction attacks undermine current business models where a model owner sells model access, e.g., via MLaaS APIs. Additionally, stolen models can enable powerful white-box attacks, facilitating privacy attacks on sensitive training data, and model evasion. In this paper, we focus on Decision Trees (DT), which are widely deployed in practice. Existing black-box extraction attacks for DTs are either query-intensive, make strong assumptions about the DT structure, or rely on rich API information. To limit attacks to the black-box setting, CPU vendors introduced Trusted Execution Environments (TEE) that use hardware-mechanisms to isolate workloads from external parties, e.g., MLaaS providers. We introduce TrEEStealer, a high-fidelity extraction attack for stealing TEE-protected DTs. TrEEStealer exploits TEE-specific side-channels to steal DTs efficiently and without strong assumptions about the API output or DT structure. The extraction efficacy stems from a novel algorithm that maximizes the information derived from each query by coupling Control-Flow Information (CFI) with passive information tracking. We use two primitives to acquire CFI: for AMD SEV, we follow previous work using the SEV-Step framework and performance counters. For Intel SGX, we reproduce prior findings on current Xeon 6 CPUs and construct a new primitive to efficiently extract the branch history of inference runs through the Branch-History-Register. We found corresponding vulnerabilities in three popular libraries: OpenCV, mlpack, and emlearn. We show that TrEEStealer achieves superior efficiency and extraction fidelity compared to prior attacks. Our work establishes a new state-of-the-art for DT extraction and confirms that TEEs fail to protect against control-flow leakage.
AIDec 6, 2024
OCEAN: Open-World Contrastive Authorship IdentificationFelix Mächtle, Jan-Niclas Serr, Nils Loose et al.
In an era where cyberattacks increasingly target the software supply chain, the ability to accurately attribute code authorship in binary files is critical to improving cybersecurity measures. We propose OCEAN, a contrastive learning-based system for function-level authorship attribution. OCEAN is the first framework to explore code authorship attribution on compiled binaries in an open-world and extreme scenario, where two code samples from unknown authors are compared to determine if they are developed by the same author. To evaluate OCEAN, we introduce new realistic datasets: CONAN, to improve the performance of authorship attribution systems in real-world use cases, and SNOOPY, to increase the robustness of the evaluation of such systems. We use CONAN to train our model and evaluate on SNOOPY, a fully unseen dataset, resulting in an AUROC score of 0.86 even when using high compiler optimizations. We further show that CONAN improves performance by 7% compared to the previously used Google Code Jam dataset. Additionally, OCEAN outperforms previous methods in their settings, achieving a 10% improvement over state-of-the-art SCS-Gan in scenarios analyzing source code. Furthermore, OCEAN can detect code injections from an unknown author in a software update, underscoring its value for securing software supply chains.
CRApr 18, 2025
Trace Gadgets: Minimizing Code Context for Machine Learning-Based Vulnerability PredictionFelix Mächtle, Nils Loose, Tim Schulz et al.
As the number of web applications and API endpoints exposed to the Internet continues to grow, so does the number of exploitable vulnerabilities. Manually identifying such vulnerabilities is tedious. Meanwhile, static security scanners tend to produce many false positives. While machine learning-based approaches are promising, they typically perform well only in scenarios where training and test data are closely related. A key challenge for ML-based vulnerability detection is providing suitable and concise code context, as excessively long contexts negatively affect the code comprehension capabilities of machine learning models, particularly smaller ones. This work introduces Trace Gadgets, a novel code representation that minimizes code context by removing non-related code. Trace Gadgets precisely capture the statements that cover the path to the vulnerability. As input for ML models, Trace Gadgets provide a minimal but complete context, thereby improving the detection performance. Moreover, we collect a large-scale dataset generated from real-world applications with manually curated labels to further improve the performance of ML-based vulnerability detectors. Our results show that state-of-the-art machine learning models perform best when using Trace Gadgets compared to previous code representations, surpassing the detection capabilities of industry-standard static scanners such as GitHub's CodeQL by at least 4% on a fully unseen dataset. By applying our framework to real-world applications, we identify and report previously unknown vulnerabilities in widely deployed software.
SESep 10, 2025
AutoStub: Genetic Programming-Based Stub Creation for Symbolic ExecutionFelix Mächtle, Nils Loose, Jan-Niclas Serr et al.
Symbolic execution is a powerful technique for software testing, but suffers from limitations when encountering external functions, such as native methods or third-party libraries. Existing solutions often require additional context, expensive SMT solvers, or manual intervention to approximate these functions through symbolic stubs. In this work, we propose a novel approach to automatically generate symbolic stubs for external functions during symbolic execution that leverages Genetic Programming. When the symbolic executor encounters an external function, AutoStub generates training data by executing the function on randomly generated inputs and collecting the outputs. Genetic Programming then derives expressions that approximate the behavior of the function, serving as symbolic stubs. These automatically generated stubs allow the symbolic executor to continue the analysis without manual intervention, enabling the exploration of program paths that were previously intractable. We demonstrate that AutoStub can automatically approximate external functions with over 90% accuracy for 55% of the functions evaluated, and can infer language-specific behaviors that reveal edge cases crucial for software testing.
CRApr 16, 2024
Dynamic Frequency-Based Fingerprinting Attacks against Modern Sandbox EnvironmentsDebopriya Roy Dipta, Thore Tiemann, Berk Gulmezoglu et al.
The cloud computing landscape has evolved significantly in recent years, embracing various sandboxes to meet the diverse demands of modern cloud applications. These sandboxes encompass container-based technologies like Docker and gVisor, microVM-based solutions like Firecracker, and security-centric sandboxes relying on Trusted Execution Environments (TEEs) such as Intel SGX and AMD SEV. However, the practice of placing multiple tenants on shared physical hardware raises security and privacy concerns, most notably side-channel attacks. In this paper, we investigate the possibility of fingerprinting containers through CPU frequency reporting sensors in Intel and AMD CPUs. One key enabler of our attack is that the current CPU frequency information can be accessed by user-space attackers. We demonstrate that Docker images exhibit a unique frequency signature, enabling the distinction of different containers with up to 84.5% accuracy even when multiple containers are running simultaneously in different cores. Additionally, we assess the effectiveness of our attack when performed against several sandboxes deployed in cloud environments, including Google's gVisor, AWS' Firecracker, and TEE-based platforms like Gramine (utilizing Intel SGX) and AMD SEV. Our empirical results show that these attacks can also be carried out successfully against all of these sandboxes in less than 40 seconds, with an accuracy of over 70% in all cases. Finally, we propose a noise injection-based countermeasure to mitigate the proposed attack on cloud environments.
6.1CRMar 31
HPCCFA: Leveraging Hardware Performance Counters for Control Flow AttestationClaudius Pott, Luca Wilke, Jan Wichelmann et al.
Trusted Execution Environments (TEEs) allow the secure execution of code on remote systems without the need to trust their operators. They use static attestation as a central mechanism for establishing trust, allowing remote parties to verify that their code is executed unmodified in an isolated environment. However, this form of attestation does not cover runtime attacks, where an attacker exploits vulnerabilities in the software inside the TEE. Control Flow Attestation (CFA), a form of runtime attestation, is designed to detect such attacks. In this work, we present a method to extend TEEs with CFA and discuss how it can prevent exploitation in the event of detected control flow violations. Furthermore, we introduce HPCCFA, a mechanism that uses HPCs for CFA purposes, enabling hardware-backed trace generation on commodity CPUs. We demonstrate the feasibility of HPCCFA on a proof-of-concept implementation for Keystone on RISC-V. Our evaluation investigates the interplay of the number of measurement points and runtime protection, and reveals a trade-off between detection reliability and performance overhead.
SEJan 19
Beyond Accuracy: Characterizing Code Comprehension Capabilities in (Large) Language ModelsFelix Mächtle, Jan-Niclas Serr, Nils Loose et al.
Large Language Models (LLMs) are increasingly integrated into software engineering workflows, yet current benchmarks provide only coarse performance summaries that obscure the diverse capabilities and limitations of these models. This paper investigates whether LLMs' code-comprehension performance aligns with traditional human-centric software metrics or instead reflects distinct, non-human regularities. We introduce a diagnostic framework that reframes code understanding as a binary input-output consistency task, enabling the evaluation of classification and generative models. Using a large-scale dataset, we correlate model performance with traditional, human-centric complexity metrics, such as lexical size, control-flow complexity, and abstract syntax tree structure. Our analyses reveal minimal correlation between human-defined metrics and LLM success (AUROC 0.63), while shadow models achieve substantially higher predictive performance (AUROC 0.86), capturing complex, partially predictable patterns beyond traditional software measures. These findings suggest that LLM comprehension reflects model-specific regularities only partially accessible through either human-designed or learned features, emphasizing the need for benchmark methodologies that move beyond aggregate accuracy and toward instance-level diagnostics, while acknowledging fundamental limits in predicting correct outcomes.
CRSep 11, 2025
Prompt Pirates Need a Map: Stealing Seeds helps Stealing PromptsFelix Mächtle, Ashwath Shetty, Jonas Sander et al.
Diffusion models have significantly advanced text-to-image generation, enabling the creation of highly realistic images conditioned on textual prompts and seeds. Given the considerable intellectual and economic value embedded in such prompts, prompt theft poses a critical security and privacy concern. In this paper, we investigate prompt-stealing attacks targeting diffusion models. We reveal that numerical optimization-based prompt recovery methods are fundamentally limited as they do not account for the initial random noise used during image generation. We identify and exploit a noise-generation vulnerability (CWE-339), prevalent in major image-generation frameworks, originating from PyTorch's restriction of seed values to a range of $2^{32}$ when generating the initial random noise on CPUs. Through a large-scale empirical analysis conducted on images shared via the popular platform CivitAI, we demonstrate that approximately 95% of these images' seed values can be effectively brute-forced in 140 minutes per seed using our seed-recovery tool, SeedSnitch. Leveraging the recovered seed, we propose PromptPirate, a genetic algorithm-based optimization method explicitly designed for prompt stealing. PromptPirate surpasses state-of-the-art methods, i.e., PromptStealer, P2HP, and CLIP-Interrogator, achieving an 8-11% improvement in LPIPS similarity. Furthermore, we introduce straightforward and effective countermeasures that render seed stealing, and thus optimization-based prompt stealing, ineffective. We have disclosed our findings responsibly and initiated coordinated mitigation efforts with the developers to address this critical vulnerability.
LGAug 7, 2025
Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classificationThorsten Peinemann, Paula Arnold, Sebastian Berndt et al.
Backdoor injection attacks are a threat to machine learning models that are trained on large data collected from untrusted sources; these attacks enable attackers to inject malicious behavior into the model that can be triggered by specially crafted inputs. Prior work has established bounds on the success of backdoor attacks and their impact on the benign learning task, however, an open question is what amount of poison data is needed for a successful backdoor attack. Typical attacks either use few samples, but need much information about the data points or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error and without significantly impacting the benign learning task performance. Moreover, we prove the one-poison hypothesis for linear regression and linear classification. For adversaries that utilize a direction that is unused by the benign data distribution for the poison sample, we show that the resulting model is functionally equivalent to a model where the poison was excluded from training. We build on prior work on statistical backdoor learning to show that in all other cases, the impact on the benign learning task is still limited. We also validate our theoretical results experimentally with realistic benchmark data sets.
CRFeb 23, 2022
IOTLB-SC: An Accelerator-Independent Leakage Source in Modern Cloud SystemsThore Tiemann, Zane Weissman, Thomas Eisenbarth et al.
Hardware peripherals such as GPUs and FPGAs are commonly available in server-grade computing to accelerate specific compute tasks, from database queries to machine learning. CSPs have integrated these accelerators into their infrastructure and let tenants combine and configure these components flexibly, based on their needs. Securing I/O interfaces is critical to ensure proper isolation between tenants in these highly complex, heterogeneous, yet shared server systems, especially in the cloud, where some peripherals may be under control of a malicious tenant. In this work, we investigate the interfaces that connect peripheral hardware components to each other and the rest of the system.We show that the I/O memory management units (IOMMUs) - intended to ensure proper isolation of peripherals - are the source of a new attack surface: the I/O translation look-aside buffer (IOTLB). We show that by using an FPGA accelerator card one can gain precise information over IOTLB activity. That information can be used for covert communication between peripherals without bothering CPU or to directly extract leakage from neighboring accelerated compute jobs such as GPU-accelerated databases. We present the first qualitative and quantitative analysis of this newly uncovered attack surface before fine-grained channels become widely viable with the introduction of CXL and PCIe 5.0. In addition, we propose possible countermeasures that software developers, hardware designers, and system administrators can use to suppress the observed side-channel leakages and analyze their implicit costs.
CRAug 10, 2021
Util::Lookup: Exploiting key decoding in cryptographic librariesFlorian Sieck, Sebastian Berndt, Jan Wichelmann et al.
Implementations of cryptographic libraries have been scrutinized for secret-dependent execution behavior exploitable by microarchitectural side-channel attacks. To prevent unintended leakages, most libraries moved to constant-time implementations of cryptographic primitives. There have also been efforts to certify libraries for use in sensitive areas, like Microsoft CNG and Botan, with specific attention to leakage behavior. In this work, we show that a common oversight in these libraries is the existence of \emph{utility functions}, which handle and thus possibly leak confidential information. We analyze the exploitability of base64 decoding functions across several widely used cryptographic libraries. Base64 decoding is used when loading keys stored in PEM format. We show that these functions by themselves leak sufficient information even if libraries are executed in trusted execution environments. In fact, we show that recent countermeasures to transient execution attacks such as LVI \emph{ease} the exploitability of the observed faint leakages, allowing us to robustly infer sufficient information about RSA private keys \emph{with a single trace}. We present a complete attack, including a broad library analysis, a high-resolution last level cache attack on SGX enclaves, and a fully parallelized implementation of the extend-and-prune approach that allows a complete key recovery at medium costs.
CRJun 29, 2021
undeSErVed trust: Exploiting Permutation-Agnostic Remote AttestationLuca Wilke, Jan Wichelmann, Florian Sieck et al.
The ongoing trend of moving data and computation to the cloud is met with concerns regarding privacy and protection of intellectual property. Cloud Service Providers (CSP) must be fully trusted to not tamper with or disclose processed data, hampering adoption of cloud services for many sensitive or critical applications. As a result, CSPs and CPU manufacturers are rushing to find solutions for secure outsourced computation in the Cloud. While enclaves, like Intel SGX, are strongly limited in terms of throughput and size, AMD's Secure Encrypted Virtualization (SEV) offers hardware support for transparently protecting code and data of entire VMs, thus removing the performance, memory and software adaption barriers of enclaves. Through attestation of boot code integrity and means for securely transferring secrets into an encrypted VM, CSPs are effectively removed from the list of trusted entities. There have been several attacks on the security of SEV, by abusing I/O channels to encrypt and decrypt data, or by moving encrypted code blocks at runtime. Yet, none of these attacks have targeted the attestation protocol, the core of the secure computing environment created by SEV. We show that the current attestation mechanism of Zen 1 and Zen 2 architectures has a significant flaw, allowing us to manipulate the loaded code without affecting the attestation outcome. An attacker may abuse this weakness to inject arbitrary code at startup -- and thus take control over the entire VM execution, without any indication to the VM's owner. Our attack primitives allow the attacker to do extensive modifications to the bootloader and the operating system, like injecting spy code or extracting secret data. We present a full end-to-end attack, from the initial exploit to leaking the key of the encrypted disk image during boot, giving the attacker unthrottled access to all of the VM's persistent data.
CRAug 27, 2020
CACHE SNIPER : Accurate timing control of cache evictionsSamira Briongos, Ida Bruhns, Pedro Malagón et al.
Microarchitectural side channel attacks have been very prominent in security research over the last few years. Caches have been an outstanding covert channel, as they provide high resolution and generic cross-core leakage even with simple user-mode code execution privileges. To prevent these generic cross-core attacks, all major cryptographic libraries now provide countermeasures to hinder key extraction via cross-core cache attacks, for instance avoiding secret dependent access patterns and prefetching data. In this paper, we show that implementations protected by 'good-enough' countermeasures aimed at preventing simple cache attacks are still vulnerable. We present a novel attack that uses a special timing technique to determine when an encryption has started and then evict the data precisely at the desired instant. This new attack does not require special privileges nor explicit synchronization between the attacker and the victim. One key improvement of our attack is a method to evict data from the cache with a single memory access and in absence of shared memory by leveraging the transient capabilities of TSX and relying on the recently reverse-engineered L3 replacement policy. We demonstrate the efficiency by performing an asynchronous last level cache attack to extract an RSA key from the latest wolfSSL library, which has been especially adapted to avoid leaky access patterns, and by extracting an AES key from the S-Box implementation included in OpenSSL bypassing the per round prefetch intended as a protection against cache attacks.
CRApr 23, 2020
SEVurity: No Security Without Integrity -- Breaking Integrity-Free Memory Encryption with Minimal AssumptionsLuca Wilke, Jan Wichelmann, Mathias Morbitzer et al.
One reason for not adopting cloud services is the required trust in the cloud provider: As they control the hypervisor, any data processed in the system is accessible to them. Full memory encryption for Virtual Machines (VM) protects against curious cloud providers as well as otherwise compromised hypervisors. AMD Secure Encrypted Virtualization (SEV) is the most prevalent hardware-based full memory encryption for VMs. Its newest extension, SEV-ES, also protects the entire VM state during context switches, aiming to ensure that the host neither learns anything about the data that is processed inside the VM, nor is able to modify its execution state. Several previous works have analyzed the security of SEV and have shown that, by controlling I/O, it is possible to exfiltrate data or even gain control over the VM's execution. In this work, we introduce two new methods that allow us to inject arbitrary code into SEV-ES secured virtual machines. Due to the lack of proper integrity protection, it is sufficient to reuse existing ciphertext to build a high-speed encryption oracle. As a result, our attack no longer depends on control over the I/O, which is needed by prior attacks. As I/O manipulation is highly detectable, our attacks are stealthier. In addition, we reverse-engineer the previously unknown, improved Xor-Encrypt-Xor (XEX) based encryption mode, that AMD is using on updated processors, and show, for the first time, how it can be overcome by our new attacks.
CRDec 24, 2019
JackHammer: Efficient Rowhammer on Heterogeneous FPGA-CPU PlatformsZane Weissman, Thore Tiemann, Daniel Moghimi et al.
After years of development, FPGAs are finally making an appearance on multi-tenant cloud servers. These heterogeneous FPGA-CPU architectures break common assumptions about isolation and security boundaries. Since the FPGA and CPU architectures share hardware resources, a new class of vulnerabilities requires us to reassess the security and dependability of these platforms. In this work, we analyze the memory and cache subsystem and study Rowhammer and cache attacks enabled on two proposed heterogeneous FPGA-CPU platforms by Intel: the Arria 10 GX with an integrated FPGA-CPU platform, and the Arria 10 GX PAC expansion card which connects the FPGA to the CPU via the PCIe interface. We show that while Intel PACs currently are immune to cache attacks from FPGA to CPU, the integrated platform is indeed vulnerable to Prime and Probe style attacks from the FPGA to the CPU's last level cache. Further, we demonstrate JackHammer, a novel and efficient Rowhammer from the FPGA to the host's main memory. Our results indicate that a malicious FPGA can perform twice as fast as a typical Rowhammer attack from the CPU on the same system and causes around four times as many bit flips as the CPU attack. We demonstrate the efficacy of JackHammer from the FPGA through a realistic fault attack on the WolfSSL RSA signing implementation that reliably causes a fault after an average of fifty-eight RSA signatures, 25% faster than a CPU rowhammer attack. In some scenarios our JackHammer attack produces faulty signatures more than three times more often and almost three times faster than a conventional CPU rowhammer attack.
CRNov 13, 2019
TPM-FAIL: TPM meets Timing and Lattice AttacksDaniel Moghimi, Berk Sunar, Thomas Eisenbarth et al.
Trusted Platform Module (TPM) serves as a hardware-based root of trust that protects cryptographic keys from privileged system and physical adversaries. In this work, we perform a black-box timing analysis of TPM 2.0 devices deployed on commodity computers. Our analysis reveals that some of these devices feature secret-dependent execution times during signature generation based on elliptic curves. In particular, we discovered timing leakage on an Intel firmware-based TPM as well as a hardware TPM. We show how this information allows an attacker to apply lattice techniques to recover 256-bit private keys for ECDSA and ECSchnorr signatures. On Intel fTPM, our key recovery succeeds after about 1,300 observations and in less than two minutes. Similarly, we extract the private ECDSA key from a hardware TPM manufactured by STMicroelectronics, which is certified at Common Criteria (CC) EAL 4+, after fewer than 40,000 observations. We further highlight the impact of these vulnerabilities by demonstrating a remote attack against a StrongSwan IPsec VPN that uses a TPM to generate the digital signatures for authentication. In this attack, the remote client recovers the server's private authentication key by timing only 45,000 authentication handshakes via a network connection. The vulnerabilities we have uncovered emphasize the difficulty of correctly implementing known constant-time techniques, and show the importance of evolutionary testing and transparent evaluation of cryptographic implementations. Even certified devices that claim resistance against attacks require additional scrutiny by the community and industry, as we learn more about these attacks.
CRJul 8, 2019
FortuneTeller: Predicting Microarchitectural Attacks via Unsupervised Deep LearningBerk Gulmezoglu, Ahmad Moghimi, Thomas Eisenbarth et al.
The growing security threat of microarchitectural attacks underlines the importance of robust security sensors and detection mechanisms at the hardware level. While there are studies on runtime detection of cache attacks, a generic model to consider the broad range of existing and future attacks is missing. Unfortunately, previous approaches only consider either a single attack variant, e.g. Prime+Probe, or specific victim applications such as cryptographic implementations. Furthermore, the state-of-the art anomaly detection methods are based on coarse-grained statistical models, which are not successful to detect anomalies in a large-scale real world systems. Thanks to the memory capability of advanced Recurrent Neural Networks (RNNs) algorithms, both short and long term dependencies can be learned more accurately. Therefore, we propose FortuneTeller, which for the first time leverages the superiority of RNNs to learn complex execution patterns and detects unseen microarchitectural attacks in real world systems. FortuneTeller models benign workload pattern from a microarchitectural standpoint in an unsupervised fashion, and then, it predicts how upcoming benign executions are supposed to behave. Potential attacks and malicious behaviors will be detected automatically, when there is a discrepancy between the predicted execution pattern and the runtime observation. We implement FortuneTeller based on the available hardware performance counters on Intel processors and it is trained with 10 million samples obtained from benign applications. For the first time, the latest attacks such as Meltdown, Spectre, Rowhammer and Zombieload are detected with one trained model and without observing these attacks during the training. We show that FortuneTeller achieves F-score of 0.9970.
CRApr 12, 2019
RELOAD+REFRESH: Abusing Cache Replacement Policies to Perform Stealthy Cache AttacksSamira Briongos, Pedro Malagón, José M. Moya et al.
Caches have become the prime method for unintended information extraction across logical isolation boundaries. Even Spectre and Meltdown rely on the cache side channel, as it provides great resolution and is widely available on all major CPU platforms. As a consequence, several methods to stop cache attacks by detecting them have been proposed. Detection is strongly aided by the fact that observing cache activity of co-resident processes is not possible without altering the cache state and thereby forcing evictions on the observed processes. In this work, we show that this widely held assumption is incorrect. Through clever usage of the cache replacement policy it is possible to track a victims process cache accesses without forcing evictions on the victim's data. Hence, online detection mechanisms that rely on these evictions can be circumvented as they do not detect be the introduced RELOAD+REFRESH attack. The attack requires a profound understanding of the cache replacement policy. We present a methodology to recover the replacement policy and apply it to the last five generations of Intel processors. We further show empirically that the performance of RELOAD+REFRESH on cryptographic implementations is comparable to that of other widely used cache attacks, while its detectability becomes extremely difficult, due to the negligible effect on the victims cache access pattern.
CRMar 1, 2019
SPOILER: Speculative Load Hazards Boost Rowhammer and Cache AttacksSaad Islam, Ahmad Moghimi, Ida Bruhns et al.
Modern microarchitectures incorporate optimization techniques such as speculative loads and store forwarding to improve the memory bottleneck. The processor executes the load speculatively before the stores, and forwards the data of a preceding store to the load if there is a potential dependency. This enhances performance since the load does not have to wait for preceding stores to complete. However, the dependency prediction relies on partial address information, which may lead to false dependencies and stall hazards. In this work, we are the first to show that the dependency resolution logic that serves the speculative load can be exploited to gain information about the physical page mappings. Microarchitectural side-channel attacks such as Rowhammer and cache attacks like Prime+Probe rely on the reverse engineering of the virtual-to-physical address mapping. We propose the SPOILER attack which exploits this leakage to speed up this reverse engineering by a factor of 256. Then, we show how this can improve the Prime+Probe attack by a 4096 factor speed up of the eviction set search, even from sandboxed environments like JavaScript. Finally, we improve the Rowhammer attack by showing how SPOILER helps to conduct DRAM row conflicts deterministically with up to 100% chance, and by demonstrating a double-sided Rowhammer attack with normal user's privilege. The later is due to the possibility of detecting contiguous memory pages using the SPOILER leakage.
CRNov 27, 2018
Undermining User Privacy on Mobile Devices Using AIBerk Gulmezoglu, Andreas Zankl, M. Caner Tol et al.
Over the past years, literature has shown that attacks exploiting the microarchitecture of modern processors pose a serious threat to the privacy of mobile phone users. This is because applications leave distinct footprints in the processor, which can be used by malware to infer user activities. In this work, we show that these inference attacks are considerably more practical when combined with advanced AI techniques. In particular, we focus on profiling the activity in the last-level cache (LLC) of ARM processors. We employ a simple Prime+Probe based monitoring technique to obtain cache traces, which we classify with Deep Learning methods including Convolutional Neural Networks. We demonstrate our approach on an off-the-shelf Android phone by launching a successful attack from an unprivileged, zeropermission App in well under a minute. The App thereby detects running applications with an accuracy of 98% and reveals opened websites and streaming videos by monitoring the LLC for at most 6 seconds. This is possible, since Deep Learning compensates measurement disturbances stemming from the inherently noisy LLC monitoring and unfavorable cache characteristics such as random line replacement policies. In summary, our results show that thanks to advanced AI techniques, inference attacks are becoming alarmingly easy to implement and execute in practice. This once more calls for countermeasures that confine microarchitectural leakage and protect mobile phone applications, especially those valuing the privacy of their users.
CRAug 16, 2018
MicroWalk: A Framework for Finding Side Channels in BinariesJan Wichelmann, Ahmad Moghimi, Thomas Eisenbarth et al.
Microarchitectural side channels expose unprotected software to information leakage attacks where a software adversary is able to track runtime behavior of a benign process and steal secrets such as cryptographic keys. As suggested by incremental software patches for the RSA algorithm against variants of side-channel attacks within different versions of cryptographic libraries, protecting security-critical algorithms against side channels is an intricate task. Software protections avoid leakages by operating in constant time with a uniform resource usage pattern independent of the processed secret. In this respect, automated testing and verification of software binaries for leakage-free behavior is of importance, particularly when the source code is not available. In this work, we propose a novel technique based on Dynamic Binary Instrumentation and Mutual Information Analysis to efficiently locate and quantify memory based and control-flow based microarchitectural leakages. We develop a software framework named \tool~for side-channel analysis of binaries which can be extended to support new classes of leakage. For the first time, by utilizing \tool, we perform rigorous leakage analysis of two widely-used closed-source cryptographic libraries: \emph{Intel IPP} and \emph{Microsoft CNG}. We analyze $15$ different cryptographic implementations consisting of $112$ million instructions in about $105$ minutes of CPU time. By locating previously unknown leakages in hardened implementations, our results suggest that \tool~can efficiently find microarchitectural leakages in software binaries.
CRAug 3, 2018
DeepCloak: Adversarial Crafting As a Defensive Measure to Cloak ProcessesMehmet Sinan Inci, Thomas Eisenbarth, Berk Sunar
Over the past decade, side-channels have proven to be significant and practical threats to modern computing systems. Recent attacks have all exploited the underlying shared hardware. While practical, mounting such a complicated attack is still akin to listening on a private conversation in a crowded train station. The attacker has to either perform significant manual labor or use AI systems to automate the process. The recent academic literature points to the latter option. With the abundance of cheap computing power and the improvements made in AI, it is quite advantageous to automate such tasks. By using AI systems however, malicious parties also inherit their weaknesses. One such weakness is undoubtedly the vulnerability to adversarial samples. In contrast to the previous literature, for the first time, we propose the use of adversarial learning as a defensive tool to obfuscate and mask private information. We demonstrate the viability of this approach by first training CNNs and other machine learning classifiers on leakage trace of different processes. After training highly accurate models (99+% accuracy), we investigate their resolve against adversarial learning methods. By applying minimal perturbations to input traces, the adversarial traffic by the defender can run as an attachment to the original process and cloak it against a malicious classifier. Finally, we investigate whether an attacker can protect her classifier model by employing adversarial defense methods, namely adversarial re-training and defensive distillation. Our results show that even in the presence of an intelligent adversary that employs such techniques, all 10 of the tested adversarial learning methods still manage to successfully craft adversarial perturbations and the proposed cloaking methodology succeeds.
CRNov 21, 2017
MemJam: A False Dependency Attack against Constant-Time Crypto ImplementationsAhmad Moghimi, Thomas Eisenbarth, Berk Sunar
Cache attacks exploit memory access patterns of cryptographic implementations. Constant-Time implementation techniques have become an indispensable tool in fighting cache timing attacks. These techniques engineer the memory accesses of cryptographic operations to follow a uniform key independent pattern. However, the constant-time behavior is dependent on the underlying architecture, which can be highly complex and often incorporates unpublished features. CacheBleed attack targets cache bank conflicts and thereby invalidates the assumption that microarchitectural side-channel adversaries can only observe memory with cache line granularity. In this work, we propose MemJam, a side-channel attack that exploits false dependency of memory read-after-write and provides a high quality intra cache level timing channel. As a proof of concept, we demonstrate the first key recovery attacks on a constant-time implementation of AES, and a SM4 implementation with cache protection in the current Intel Integrated Performance Primitives (Intel IPP) cryptographic library. Further, we demonstrate the first intra cache level timing attack on SGX by reproducing the AES key recovery results on an enclave that performs encryption using the aforementioned constant-time implementation of AES. Our results show that we can not only use this side channel to efficiently attack memory dependent cryptographic operations but also to bypass proposed protections. Compared to CacheBleed, which is limited to older processor generations, MemJam is the first intra cache level attack applicable to all major Intel processors including the latest generations that support the SGX extension.
CRSep 6, 2017
CacheShield: Protecting Legacy Processes Against Cache AttacksSamira Briongos, Gorka Irazoqui, Pedro Malagón et al.
Cache attacks pose a threat to any code whose execution flow or memory accesses depend on sensitive information. Especially in public clouds, where caches are shared across several tenants, cache attacks remain an unsolved problem. Cache attacks rely on evictions by the spy process, which alter the execution behavior of the victim process. We show that hardware performance events of cryptographic routines reveal the presence of cache attacks. Based on this observation, we propose CacheShield, a tool to protect legacy code by monitoring its execution and detecting the presence of cache attacks, thus providing the opportunity to take preventative measures. CacheShield can be run by users and does not require alteration of the OS or hypervisor, while previously proposed software-based countermeasures require cooperation from the hypervisor. Unlike methods that try to detect malicious processes, our approach is lean, as only a fraction of the system needs to be monitored. It also integrates well into today's cloud infrastructure, as concerned users can opt to use CacheShield without support from the cloud service provider. Our results show that CacheShield detects cache attacks fast, with high reliability, and with few false positives, even in the presence of strong noise.
CRSep 5, 2017
Did we learn from LLC Side Channel Attacks? A Cache Leakage Detection Tool for Crypto LibrariesGorka Irazoqui, Kai Cong, Xiaofei Guo et al.
This work presents a new tool to verify the correctness of cryptographic implementations with respect to cache attacks. Our methodology discovers vulnerabilities that are hard to find with other techniques, observed as exploitable leakage. The methodology works by identifying secret dependent memory and introducing forced evictions inside potentially vulnerable code to obtain cache traces that are analyzed using Mutual Information. If dependence is observed, the cryptographic implementation is classified as to leak information. We demonstrate the viability of our technique in the design of the three main cryptographic primitives, i.e., AES, RSA and ECC, in eight popular up to date cryptographic libraries, including OpenSSL, Libgcrypt, Intel IPP and NSS. Our results show that cryptographic code designers are far away from incorporating the appropriate countermeasures to avoid cache leakages, as we found that 50% of the default implementations analyzed leaked information that lead to key extraction. We responsibly notified the designers of all the leakages found and suggested patches to solve these vulnerabilities.
CRMay 12, 2017
PerfWeb: How to Violate Web Privacy with Hardware Performance EventsBerk Gulmezoglu, Andreas Zankl, Thomas Eisenbarth et al.
The browser history reveals highly sensitive information about users, such as financial status, health conditions, or political views. Private browsing modes and anonymity networks are consequently important tools to preserve the privacy not only of regular users but in particular of whistleblowers and dissidents. Yet, in this work we show how a malicious application can infer opened websites from Google Chrome in Incognito mode and from Tor Browser by exploiting hardware performance events (HPEs). In particular, we analyze the browsers' microarchitectural footprint with the help of advanced Machine Learning techniques: k-th Nearest Neighbors, Decision Trees, Support Vector Machines, and in contrast to previous literature also Convolutional Neural Networks. We profile 40 different websites, 30 of the top Alexa sites and 10 whistleblowing portals, on two machines featuring an Intel and an ARM processor. By monitoring retired instructions, cache accesses, and bus cycles for at most 5 seconds, we manage to classify the selected websites with a success rate of up to 86.3%. The results show that hardware performance events can clearly undermine the privacy of web users. We therefore propose mitigation strategies that impede our attacks and still allow legitimate use of HPEs.
CRMar 28, 2017
AutoLock: Why Cache Attacks on ARM Are Harder Than You ThinkMarc Green, Leandro Rodrigues-Lima, Andreas Zankl et al.
Attacks on the microarchitecture of modern processors have become a practical threat to security and privacy in desktop and cloud computing. Recently, cache attacks have successfully been demonstrated on ARM based mobile devices, suggesting they are as vulnerable as their desktop or server counterparts. In this work, we show that previous literature might have left an overly pessimistic conclusion of ARM's security as we unveil AutoLock: an internal performance enhancement found in inclusive cache levels of ARM processors that adversely affects Evict+Time, Prime+Probe, and Evict+Reload attacks. AutoLock's presence on system-on-chips (SoCs) is not publicly documented, yet knowing that it is implemented is vital to correctly assess the risk of cache attacks. We therefore provide a detailed description of the feature and propose three ways to detect its presence on actual SoCs. We illustrate how AutoLock impedes cross-core cache evictions, but show that its effect can also be compensated in a practical attack. Our findings highlight the intricacies of cache attacks on ARM and suggest that a fair and comprehensive vulnerability assessment requires an in-depth understanding of ARM's cache architectures and rigorous testing across a broad range of ARM based devices.
CRMar 20, 2017
CacheZoom: How SGX Amplifies The Power of Cache AttacksAhmad Moghimi, Gorka Irazoqui, Thomas Eisenbarth
In modern computing environments, hardware resources are commonly shared, and parallel computation is widely used. Parallel tasks can cause privacy and security problems if proper isolation is not enforced. Intel proposed SGX to create a trusted execution environment within the processor. SGX relies on the hardware, and claims runtime protection even if the OS and other software components are malicious. However, SGX disregards side-channel attacks. We introduce a powerful cache side-channel attack that provides system adversaries a high resolution channel. Our attack tool named CacheZoom is able to virtually track all memory accesses of SGX enclaves with high spatial and temporal precision. As proof of concept, we demonstrate AES key recovery attacks on commonly used implementations including those that were believed to be resistant in previous scenarios. Our results show that SGX cannot protect critical data sensitive computations, and efficient AES key recovery is possible in a practical environment. In contrast to previous works which require hundreds of measurements, this is the first cache side-channel attack on a real system that can recover AES keys with a minimal number of measurements. We can successfully recover AES keys from T-Table based implementations with as few as ten measurements.