Thorsten Holz

CR
h-index36
41papers
4,099citations
Novelty57%
AI Score60

41 Papers

CRFeb 8, 2023Code
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Hossein Hajipour, Keno Hassler, Thorsten Holz et al.

Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Their advances in competition-level programming problems have made them an essential pillar of AI-assisted pair programming, and tools such as GitHub Copilot have emerged as part of the daily programming workflow used by millions of developers. The training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure. While these models have been extensively assessed for their ability to produce functionally correct programs, there remains a lack of comprehensive investigations and benchmarks addressing the security aspects of these models. In this work, we propose a method to systematically study the security issues of code language models to assess their susceptibility to generating vulnerable code. To this end, we introduce the first approach to automatically find generated code that contains vulnerabilities in black-box code generation models. To achieve this, we present an approach to approximate inversion of the black-box code generation models based on few-shot prompting. We evaluate the effectiveness of our approach by examining code language models in generating high-risk security weaknesses. Furthermore, we establish a collection of diverse non-secure prompts for various vulnerability scenarios using our method. This dataset forms a benchmark for evaluating and comparing the security weaknesses in code language models.

CRFeb 23, 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra et al.

Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks' practical viability against both real-world systems, such as Bing's GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.

CVOct 26, 2022
Towards the Detection of Diffusion Model Deepfakes

Jonas Ricker, Simon Damm, Thorsten Holz et al.

In the course of the past few years, diffusion models (DMs) have reached an unprecedented level of visual quality. However, relatively little attention has been paid to the detection of DM-generated images, which is critical to prevent adverse impacts on our society. In contrast, generative adversarial networks (GANs), have been extensively studied from a forensic perspective. In this work, we therefore take the natural next step to evaluate whether previous methods can be used to detect images generated by DMs. Our experiments yield two key findings: (1) state-of-the-art GAN detectors are unable to reliably distinguish real from DM-generated images, but (2) re-training them on DM-generated images allows for almost perfect detection, which remarkably even generalizes to GANs. Together with a feature space analysis, our results lead to the hypothesis that DMs produce fewer detectable artifacts and are thus more difficult to detect compared to GANs. One possible reason for this is the absence of grid-like frequency artifacts in DM-generated images, which are a known weakness of GANs. However, we make the interesting observation that diffusion models tend to underestimate high frequencies, which we attribute to the learning objective.

CRMar 25, 2023
No more Reviewer #2: Subverting Automatic Paper-Reviewer Assignment using Adversarial Learning

Thorsten Eisenhofer, Erwin Quiring, Jonas Möller et al.

The number of papers submitted to academic conferences is steadily rising in many scientific disciplines. To handle this growth, systems for automatic paper-reviewer assignments are increasingly used during the reviewing process. These systems use statistical topic models to characterize the content of submissions and automate the assignment to reviewers. In this paper, we show that this automation can be manipulated using adversarial learning. We propose an attack that adapts a given paper so that it misleads the assignment and selects its own reviewers. Our attack is based on a novel optimization strategy that alternates between the feature space and problem space to realize unobtrusive changes to the paper. To evaluate the feasibility of our attack, we simulate the paper-reviewer assignment of an actual security conference (IEEE S&P) with 165 reviewers on the program committee. Our results show that we can successfully select and remove reviewers without access to the assignment system. Moreover, we demonstrate that the manipulated papers remain plausible and are often indistinguishable from benign submissions.

81.1CRMay 14Code
Toward Securing AI Agents Like Operating Systems

Lukas Pirch, Micha Horlboge, Patrick Großmann et al.

Autonomous agents based on large language models (LLMs) are rapidly emerging as a general-purpose technology, with recent systems such as OpenClaw extending their capabilities through broad tool use, third-party skills, and deeper integration into user environments. At the same time, these agentic systems introduce substantial security risks by combining unconstrained capabilities with access to sensitive user data. In this work, we investigate the security of LLM-based agents through the lens of operating systems. We argue that both face strikingly similar challenges in isolating resources, separating privileges, and mediating communication. Guided by this perspective, we survey the current landscape of open-source agents, derive a unified agent architecture, and systematically analyze potential attack vectors. To validate this analysis, we conduct a case study evaluating four widely used OpenClaw-like agents. Even under modest attacker capabilities, we find that several protection mechanisms fail in practice and that secure operation requires detailed system knowledge and careful configuration. However, we also observe that while some agentic capabilities remain insecure by design, many vulnerabilities can be mitigated using well-established techniques from operating system security. We conclude with a set of recommendations for the secure design of agentic systems.

CRSep 10, 2024
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data

Hossein Hajipour, Lea Schönherr, Thorsten Holz et al.

Large language models (LLMs) have shown great potential for automatic code generation and form the basis for various tools such as GitHub Copilot. However, recent studies highlight that many LLM-generated code contains serious security vulnerabilities. While previous work tries to address this by training models that generate secure code, these attempts remain constrained by limited access to training data and labor-intensive data preparation. In this paper, we introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes by automatically synthesizing secure codes, which reduces the effort of finding suitable training data. HexaCoder comprises two key components: an oracle-guided data synthesis pipeline and a two-step process for secure code generation. The data synthesis pipeline generates pairs of vulnerable and fixed codes for specific Common Weakness Enumeration (CWE) types by utilizing a state-of-the-art LLM for repairing vulnerable code. A security oracle identifies vulnerabilities, and a state-of-the-art LLM repairs them by extending and/or editing the codes, creating data pairs for fine-tuning using the Low-Rank Adaptation (LoRA) method. Each example of our fine-tuning dataset includes the necessary security-related libraries and code that form the basis of our novel two-step generation approach. This allows the model to integrate security-relevant libraries before generating the main code, significantly reducing the number of generated vulnerable codes by up to 85% compared to the baseline methods. We perform extensive evaluations on three different benchmarks for four LLMs, demonstrating that HexaCoder not only improves the security of the generated code but also maintains a high level of functional correctness.

76.7AIMay 17
The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure

Qiqi Liu, Thorsten Holz, Shilin Ye et al.

Multi-agent systems extend large language models (LLMs) by decomposing tasks among specialized agents, but their distributed decision process creates new attack surfaces. We identify \emph{semantic hijacking}, an attack in which harmful requests are concealed within domain-specific narratives and propagated to a Manager through Worker reports, without any syntactic injection primitives. Across 42,000 adversarial trials over 12 Manager models and 7 Worker configurations, we uncover a \emph{capability paradox}: as Worker capability increases, the mean system-level Attack Success Rate (ASR) increases from 18.4% to 63.9%, peaking at 94.4%. To explain this effect, we conduct multi-level mediation analysis on two independent datasets (47,807 interactions). This analysis shows that this paradox is driven by \emph{linguistic certainty}: stronger Workers are more likely to interpret adversarial narratives as legitimate, convey their conclusions assertively, and thereby lead Managers to treat such confident endorsements as justification to execute. In our larger Worker-Only setting ($n_W$=14), certainty mediates 74% of the effect, with 95% confidence intervals (CI) excluding zero under both Monte Carlo and cluster bootstrap; the smaller Full-MAS setting ($n_W$ =6) shows a directionally consistent indirect effect. Worker-side safety prompting does not reliably mitigate this failure. Building on the mediation finding, we propose \emph{heterogeneous ensemble verification}, which pairs Workers of asymmetric domain competence so their complementary vulnerabilities break the certainty-to-execution chain, reducing ASR from 52.8% to 2.0% with negligible benign-task impact. Our results show that upgrading components to stronger models can actively degrade system security, and that effective defenses require exploiting--rather than eliminating--capability asymmetries between agents.

CRSep 29, 2025Code
A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

Qianshan Wei, Tengchao Yang, Yaochen Wang et al.

Large Language Model (LLM) agents use memory to learn from past interactions, enabling autonomous planning and decision-making in complex environments. However, this reliance on memory introduces a critical security risk: an adversary can inject seemingly harmless records into an agent's memory to manipulate its future behavior. This vulnerability is characterized by two core aspects: First, the malicious effect of injected records is only activated within a specific context, making them hard to detect when individual memory entries are audited in isolation. Second, once triggered, the manipulation can initiate a self-reinforcing error cycle: the corrupted outcome is stored as precedent, which not only amplifies the initial error but also progressively lowers the threshold for similar attacks in the future. To address these challenges, we introduce A-MemGuard (Agent-Memory Guard), the first proactive defense framework for LLM agent memory. The core idea of our work is the insight that memory itself must become both self-checking and self-correcting. Without modifying the agent's core architecture, A-MemGuard combines two mechanisms: (1) consensus-based validation, which detects anomalies by comparing reasoning paths derived from multiple related memories and (2) a dual-memory structure, where detected failures are distilled into ``lessons'' stored separately and consulted before future actions, breaking error cycles and enabling adaptation. Comprehensive evaluations on multiple benchmarks show that A-MemGuard effectively cuts attack success rates by over 95% while incurring a minimal utility cost. This work shifts LLM memory security from static filtering to a proactive, experience-driven model where defenses strengthen over time. Our code is available in https://github.com/TangciuYueng/AMemGuard

96.9CRMay 11
ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

Zhun Wang, Nico Schiller, Hongwei Li et al.

AI agents are rapidly gaining capabilities that could significantly reshape cybersecurity, making rigorous evaluation urgent. A critical capability is exploitation: turning a vulnerability, which is not yet an attack, into a concrete security impact, such as unauthorized file access or code execution. Exploitation is a particularly challenging task because it requires low-level program reasoning (e.g., about memory layout), runtime adaptation, and sustained progress over long horizons. Meanwhile, it is inherently dual-use, supporting defensive workflows while lowering the barrier for offense. Despite its importance and diagnostic value, exploitation remains under-evaluated. To address this gap, we introduce ExploitGym, a large-scale, diverse, realistic benchmark on the exploitation capabilities of AI agents. Given a program input that triggers a vulnerability, ExploitGym tasks agents with progressively extending it into a working exploit. The benchmark comprises 898 instances sourced from real-world vulnerabilities across three domains, including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. We vary the security protections applied to each instance, isolating their impact on agent performance. All configurations are packaged in reproducible containerized environments. Our evaluation shows that while exploitation remains challenging, frontier models can successfully exploit a non-trivial fraction of vulnerabilities. For example, the strongest configurations are Anthropic's latest model Claude Mythos Preview and OpenAI's GPT-5.5, which produce working exploits for 157 and 120 instances, respectively. Notably, even with widely used defenses enabled, models retain non-trivial success rates. These results establish ExploitGym as an effective testbed for exploitation and highlight the growing cybersecurity risks posed by increasingly capable AI agents.

CRJul 5, 2020Code
EvilCoder: Automated Bug Insertion

Jannik Pewny, Thorsten Holz

The art of finding software vulnerabilities has been covered extensively in the literature and there is a huge body of work on this topic. In contrast, the intentional insertion of exploitable, security-critical bugs has received little (public) attention yet. Wanting more bugs seems to be counterproductive at first sight, but the comprehensive evaluation of bug-finding techniques suffers from a lack of ground truth and the scarcity of bugs. In this paper, we propose EvilCoder, a system to automatically find potentially vulnerable source code locations and modify the source code to be actually vulnerable. More specifically, we leverage automated program analysis techniques to find sensitive sinks which match typical bug patterns (e.g., a sensitive API function with a preceding sanity check), and try to find data-flow connections to user-controlled sources. We then transform the source code such that exploitation becomes possible, for example by removing or modifying input sanitization or other types of security checks. Our tool is designed to randomly pick vulnerable locations and possible modifications, such that it can generate numerous different vulnerabilities on the same software corpus. We evaluated our tool on several open-source projects such as for example libpng and vsftpd, where we found between 22 and 158 unique connected source-sink pairs per project. This translates to hundreds of potentially vulnerable data-flow paths and hundreds of bugs we can insert. We hope to support future bug-finding techniques by supplying freshly generated, bug-ridden test corpora so that such techniques can (finally) be evaluated and compared in a comprehensive and statistically meaningful way.

31.2CRMay 8
CCX: Enabling Unmodified Intel SGX Applications on Arm CCA

Matti Schulze, Thorsten Holz, Felix Freiling

Novel confidential computing technologies such as Intel TDX, AMD SEV, and Arm CCA have recently emerged. In practice, due to its minimal trust boundaries, Intel SGX still remains widely used for enclave-based applications in cloud environments, including confidential cloud services, privacy-preserving communication, secure payment processing, and privacy-focused advertising. With the growing adoption of Arm CPUs in cloud systems, however, existing SGX applications face a significant portability challenge: they are tightly coupled to SGX-specific APIs and execution semantics. In this paper, we present the design and implementation of CCX, a framework that enables existing SGX applications to run on Arm CCA without source code modification. To this end, CCX redesigns SGX functionality within Arm CCA firmware, adapting SGX abstractions to CCA's architecture design while preserving full compatibility with existing applications originally developed for SGX. We implemented a prototype of CCX on both the QEMU emulator and a Nitrogen8M development board. Our evaluation shows that CCX is capable of executing existing SGX applications without requiring source code changes, while providing security guarantees comparable to Intel SGX and achieving performance improvements in our evaluated settings.

CRApr 22, 2024
AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images

Jonas Ricker, Dennis Assenmacher, Thorsten Holz et al.

Recent advances in the field of generative artificial intelligence (AI) have blurred the lines between authentic and machine-generated content, making it almost impossible for humans to distinguish between such media. One notable consequence is the use of AI-generated images for fake profiles on social media. While several types of disinformation campaigns and similar incidents have been reported in the past, a systematic analysis has been lacking. In this work, we conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter. We tackle the challenges of a real-world measurement study by carefully integrating various data sources and designing a multi-stage detection pipeline. Our analysis of nearly 15 million Twitter profile pictures shows that 0.052% were artificially generated, confirming their notable presence on the platform. We comprehensively examine the characteristics of these accounts and their tweet content, and uncover patterns of coordinated inauthentic behavior. The results also reveal several motives, including spamming and political amplification campaigns. Our research reaffirms the need for effective detection and mitigation strategies to cope with the potential negative effects of generative AI in the future.

CRMay 28, 2025
Security Benefits and Side Effects of Labeling AI-Generated Images

Sandra Höltervennhoff, Jonas Ricker, Maike M. Raphael et al.

Generative artificial intelligence is developing rapidly, impacting humans' interaction with information and digital media. It is increasingly used to create deceptively realistic misinformation, so lawmakers have imposed regulations requiring the disclosure of AI-generated content. However, only little is known about whether these labels reduce the risks of AI-generated misinformation. Our work addresses this research gap. Focusing on AI-generated images, we study the implications of labels, including the possibility of mislabeling. Assuming that simplicity, transparency, and trust are likely to impact the successful adoption of such labels, we first qualitatively explore users' opinions and expectations of AI labeling using five focus groups. Second, we conduct a pre-registered online survey with over 1300 U.S. and EU participants to quantitatively assess the effect of AI labels on users' ability to recognize misinformation containing either human-made or AI-generated images. Our focus groups illustrate that, while participants have concerns about the practical implementation of labeling, they consider it helpful in identifying AI-generated images and avoiding deception. However, considering security benefits, our survey revealed an ambiguous picture, suggesting that users might over-rely on labels. While inaccurate claims supported by labeled AI-generated images were rated less credible than those with unlabeled AI-images, the belief in accurate claims also decreased when accompanied by a labeled AI-generated image. Moreover, we find the undesired side effect that human-made images conveying inaccurate claims were perceived as more credible in the presence of labels.

CRDec 10, 2023
A Representative Study on Human Detection of Artificially Generated Media Across Countries

Joel Frank, Franziska Herbert, Jonas Ricker et al.

AI-generated media has become a threat to our digital society as we know it. These forgeries can be created automatically and on a large scale based on publicly available technology. Recognizing this challenge, academics and practitioners have proposed a multitude of automatic detection strategies to detect such artificial media. However, in contrast to these technical advances, the human perception of generated media has not been thoroughly studied yet. In this paper, we aim at closing this research gap. We perform the first comprehensive survey into people's ability to detect generated media, spanning three countries (USA, Germany, and China) with 3,002 participants across audio, image, and text media. Our results indicate that state-of-the-art forgeries are almost indistinguishable from "real" media, with the majority of participants simply guessing when asked to rate them as human- or machine-generated. In addition, AI-generated media receive is voted more human like across all media types and all countries. To further understand which factors influence people's ability to detect generated media, we include personal variables, chosen based on a literature review in the domains of deepfake and fake news research. In a regression analysis, we found that generalized trust, cognitive reflection, and self-reported familiarity with deepfakes significantly influence participant's decision across all media categories.

CRNov 4, 2021
Nyx-Net: Network Fuzzing with Incremental Snapshots

Sergej Schumilo, Cornelius Aschermann, Andrea Jemmett et al.

Coverage-guided fuzz testing ("fuzzing") has become mainstream and we have observed lots of progress in this research area recently. However, it is still challenging to efficiently test network services with existing coverage-guided fuzzing methods. In this paper, we introduce the design and implementation of Nyx-Net, a novel snapshot-based fuzzing approach that can successfully fuzz a wide range of targets spanning servers, clients, games, and even Firefox's Inter-Process Communication (IPC) interface. Compared to state-of-the-art methods, Nyx-Net improves test throughput by up to 300x and coverage found by up to 70%. Additionally, Nyx-Net is able to find crashes in two of ProFuzzBench's targets that no other fuzzer found previously. When using Nyx-Net to play the game Super Mario, Nyx-Net shows speedups of 10-30x compared to existing work. Under some circumstances, Nyx-Net is even able play "faster than light": solving the level takes less wall-clock time than playing the level perfectly even once. Nyx-Net is able to find previously unknown bugs in servers such as Lighttpd, clients such as MySQL client, and even Firefox's IPC mechanism - demonstrating the strength and versatility of the proposed approach. Lastly, our prototype implementation was awarded a $20.000 bug bounty for enabling fuzzing on previously unfuzzable code in Firefox and solving a long-standing problem at Mozilla.

CRJun 16, 2021
Technical Report: Hardening Code Obfuscation Against Automated Attacks

Moritz Schloegel, Tim Blazytko, Moritz Contag et al.

Software obfuscation is a crucial technology to protect intellectual property and manage digital rights within our society. Despite its huge practical importance, both commercial and academic state-of-the-art obfuscation methods are vulnerable to a plethora of automated deobfuscation attacks, such as symbolic execution, taint analysis, or program synthesis. While several enhanced obfuscation techniques were recently proposed to thwart taint analysis or symbolic execution, they either impose a prohibitive runtime overhead or can be removed in an automated way (e.g., via compiler optimizations). In general, these techniques suffer from focusing on a single attack vector, allowing an attacker to switch to other, more effective techniques, such as program synthesis. In this work, we present Loki, an approach for software obfuscation that is resilient against all known automated deobfuscation attacks. To this end, we use and efficiently combine multiple techniques, including a generic approach to synthesize formally verified expressions of arbitrary complexity. Contrary to state-of-the-art approaches that rely on a few hardcoded generation rules, our expressions are more diverse and harder to pattern match against. Even the most recent state-of-the-art research on Mixed-Boolean Arithmetic (MBA) deobfuscation fails to simplify them. Moreover, Loki protects against previously unaccounted attack vectors such as program synthesis, for which it reduces the success rate to merely 19%. In a comprehensive evaluation, we show that our design incurs significantly less overhead while providing a much stronger protection level compared to existing works.

CVApr 7, 2021
[RE] CNN-generated images are surprisingly easy to spot...for now

Joel Frank, Thorsten Holz

This work evaluates the reproducibility of the paper "CNN-generated images are surprisingly easy to spot... for now" by Wang et al. published at CVPR 2020. The paper addresses the challenge of detecting CNN-generated imagery, which has reached the potential to even fool humans. The authors propose two methods which help an image classifier to generalize from being trained on one specific CNN to detecting imagery produced by unseen architectures, training methods, or data sets. The paper proposes two methods to help a classifier generalize: (i) utilizing different kinds of data augmentations and (ii) using a diverse data set. This report focuses on assessing if these techniques indeed help the generalization process. Furthermore, we perform additional experiments to study the limitations of the proposed techniques.

CRFeb 10, 2021
Dompteur: Taming Audio Adversarial Examples

Thorsten Eisenhofer, Lea Schönherr, Joel Frank et al.

Adversarial examples seem to be inevitable. These specifically crafted inputs allow attackers to arbitrarily manipulate machine learning systems. Even worse, they often seem harmless to human observers. In our digital society, this poses a significant threat. For example, Automatic Speech Recognition (ASR) systems, which serve as hands-free interfaces to many kinds of systems, can be attacked with inputs incomprehensible for human listeners. The research community has unsuccessfully tried several approaches to tackle this problem. In this paper we propose a different perspective: We accept the presence of adversarial examples against ASR systems, but we require them to be perceivable by human listeners. By applying the principles of psychoacoustics, we can remove semantically irrelevant information from the ASR input and train a model that resembles human perception more closely. We implement our idea in a tool named DOMPTEUR and demonstrate that our augmented system, in contrast to an unmodified baseline, successfully focuses on perceptible ranges of the input signal. This change forces adversarial examples into the audible range, while using minimal computational overhead and preserving benign performance. To evaluate our approach, we construct an adaptive attacker that actively tries to avoid our augmentations and demonstrate that adversarial examples from this attacker remain clearly perceivable. Finally, we substantiate our claims by performing a hearing test with crowd-sourced human listeners.

SDOct 21, 2020
VenoMave: Targeted Poisoning Against Speech Recognition

Hojjat Aghakhani, Lea Schönherr, Thorsten Eisenhofer et al.

Despite remarkable improvements, automatic speech recognition is susceptible to adversarial perturbations. Compared to standard machine learning architectures, these attacks are significantly more challenging, especially since the inputs to a speech recognition system are time series that contain both acoustic and linguistic properties of speech. Extracting all recognition-relevant information requires more complex pipelines and an ensemble of specialized components. Consequently, an attacker needs to consider the entire pipeline. In this paper, we present VENOMAVE, the first training-time poisoning attack against speech recognition. Similar to the predominantly studied evasion attacks, we pursue the same goal: leading the system to an incorrect and attacker-chosen transcription of a target audio waveform. In contrast to evasion attacks, however, we assume that the attacker can only manipulate a small part of the training data without altering the target audio waveform at runtime. We evaluate our attack on two datasets: TIDIGITS and Speech Commands. When poisoning less than 0.17% of the dataset, VENOMAVE achieves attack success rates of more than 80.0%, without access to the victim's network architecture or hyperparameters. In a more realistic scenario, when the target audio waveform is played over the air in different rooms, VENOMAVE maintains a success rate of up to 73.3%. Finally, VENOMAVE achieves an attack transferability rate of 36.4% between two different model architectures.

CRAug 2, 2020
Unacceptable, where is my privacy? Exploring Accidental Triggers of Smart Speakers

Lea Schönherr, Maximilian Golla, Thorsten Eisenhofer et al.

Voice assistants like Amazon's Alexa, Google's Assistant, or Apple's Siri, have become the primary (voice) interface in smart speakers that can be found in millions of households. For privacy reasons, these speakers analyze every sound in their environment for their respective wake word like ''Alexa'' or ''Hey Siri,'' before uploading the audio stream to the cloud for further processing. Previous work reported on the inaccurate wake word detection, which can be tricked using similar words or sounds like ''cocaine noodles'' instead of ''OK Google.'' In this paper, we perform a comprehensive analysis of such accidental triggers, i.,e., sounds that should not have triggered the voice assistant, but did. More specifically, we automate the process of finding accidental triggers and measure their prevalence across 11 smart speakers from 8 different manufacturers using everyday media such as TV shows, news, and other kinds of audio datasets. To systematically detect accidental triggers, we describe a method to artificially craft such triggers using a pronouncing dictionary and a weighted, phone-based Levenshtein distance. In total, we have found hundreds of accidental triggers. Moreover, we explore potential gender and language biases and analyze the reproducibility. Finally, we discuss the resulting privacy implications of accidental triggers and explore countermeasures to reduce and limit their impact on users' privacy. To foster additional research on these sounds that mislead machine learning models, we publish a dataset of more than 1000 verified triggers as a research artifact.

CRJul 7, 2020
VPS: Excavating High-Level C++ Constructs from Low-Level Binaries to Protect Dynamic Dispatching

Andre Pawlowski, Victor van der Veen, Dennis Andriesse et al.

Polymorphism and inheritance make C++ suitable for writing complex software, but significantly increase the attack surface because the implementation relies on virtual function tables (vtables). These vtables contain function pointers that attackers can potentially hijack and in practice, vtable hijacking is one of the most important attack vector for C++ binaries. In this paper, we present VTable Pointer Separation (VPS), a practical binary-level defense against vtable hijacking in C++ applications. Unlike previous binary-level defenses, which rely on unsound static analyses to match classes to virtual callsites, VPS achieves a more accurate protection by restricting virtual callsites to validly created objects. More specifically, VPS ensures that virtual callsites can only use objects created at valid object construction sites, and only if those objects can reach the callsite. Moreover, VPS explicitly prevents false positives (falsely identified virtual callsites) from breaking the binary, an issue existing work does not handle correctly or at all. We evaluate the prototype implementation of VPS on a diverse set of complex, real-world applications (MongoDB, MySQL server, Node.js, SPEC CPU2017/CPU2006), showing that our approach protects on average 97.8% of all virtual callsites in SPEC CPU2006 and 97.4% in SPEC CPU2017 (all C++ benchmarks), with a moderate performance overhead of 11% and 9% geomean, respectively. Furthermore, our evaluation reveals 86 false negatives in VTV, a popular source-based defense which is part of GCC.

CRJul 6, 2020
Automated Multi-Architectural Discovery of CFI-Resistant Code Gadgets

Patrick Wollgast, Robert Gawlik, Behrad Garmany et al.

Memory corruption vulnerabilities are still a severe threat for software systems. To thwart the exploitation of such vulnerabilities, many different kinds of defenses have been proposed in the past. Most prominently, Control-Flow Integrity (CFI) has received a lot of attention recently. Several proposals were published that apply coarse-grained policies with a low performance overhead. However, their security remains questionable as recent attacks have shown. To ease the assessment of a given CFI implementation, we introduce a framework to discover code gadgets for code-reuse attacks that conform to coarse-grained CFI policies. For this purpose, binary code is extracted and transformed to a symbolic representation in an architecture-independent manner. Additionally, code gadgets are verified to provide the needed functionality for a security researcher. We show that our framework finds more CFI-compatible gadgets compared to other code gadget discovery tools. Furthermore, we demonstrate that code gadgets needed to bypass CFI solutions on the ARM architecture can be discovered by our framework as well.

CRJul 6, 2020
Detile: Fine-Grained Information Leak Detection in Script Engines

Robert Gawlik, Philipp Koppe, Benjamin Kollenda et al.

Memory disclosure attacks play an important role in the exploitation of memory corruption vulnerabilities. By analyzing recent research, we observe that bypasses of defensive solutions that enforce control-flow integrity or attempt to detect return-oriented programming require memory disclosure attacks as a fundamental first step. However, research lags behind in detecting such information leaks. In this paper, we tackle this problem and present a system for fine-grained, automated detection of memory disclosure attacks against scripting engines. The basic insight is as follows: scripting languages, such as JavaScript in web browsers, are strictly sandboxed. They must not provide any insights about the memory layout in their contexts. In fact, any such information potentially represents an ongoing memory disclosure attack. Hence, to detect information leaks, our system creates a clone of the scripting engine process with a re-randomized memory layout. The clone is instrumented to be synchronized with the original process. Any inconsistency in the script contexts of both processes appears when a memory disclosure was conducted to leak information about the memory layout. Based on this detection approach, we have designed and implemented Detile (\underline{det}ection of \underline{i}nformation \underline{le}aks), a prototype for the JavaScript engine in Microsoft's Internet Explorer 10/11 on Windows 8.0/8.1. An empirical evaluation shows that our tool can successfully detect memory disclosure attacks even against this proprietary software.

CRJul 6, 2020
An Exploratory Analysis of Microcode as a Building Block for System Defenses

Benjamin Kollenda, Philipp Koppe, Marc Fyrbiak et al.

Microcode is an abstraction layer used by modern x86 processors that interprets user-visible CISC instructions to hardware-internal RISC instructions. The capability to update x86 microcode enables a vendor to modify CPU behavior in-field, and thus patch erroneous microarchitectural processes or even implement new features. Most prominently, the recent Spectre and Meltdown vulnerabilities were mitigated by Intel via microcode updates. Unfortunately, microcode is proprietary and closed source, and there is little publicly available information on its inner workings. In this paper, we present new reverse engineering results that extend and complement the public knowledge of proprietary microcode. Based on these novel insights, we show how modern system defenses and tools can be realized in microcode on a commercial, off-the-shelf AMD x86 CPU. We demonstrate how well-established system security defenses such as timing attack mitigations, hardware-assisted address sanitization, and instruction set randomization can be realized in microcode. We also present a proof-of-concept implementation of a microcode-assisted instrumentation framework. Finally, we show how a secure microcode update mechanism and enclave functionality can be implemented in microcode to realize a small trusted execution environment. All microcode programs and the whole infrastructure needed to reproduce and extend our results are publicly available.

CRJul 5, 2020
Breaking and Fixing Destructive Code Read Defenses

Jannik Pewny, Philipp Koppe, Lucas Davi et al.

Just-in-time return-oriented programming (JIT-ROP) is a powerful memory corruption attack that bypasses various forms of code randomization. Execute-only memory (XOM) can potentially prevent these attacks, but requires source code. In contrast, destructive code reads (DCR) provide a trade-off between security and legacy compatibility. The common belief is that DCR provides strong protection if combined with a high-entropy code randomization. The contribution of this paper is twofold: first, we demonstrate that DCR can be bypassed regardless of the underlying code randomization scheme. To this end, we show novel, generic attacks that infer the code layout for highly randomized program code. Second, we present the design and implementation of BGDX (Byte-Granular DCR and XOM), a novel mitigation technique that protects legacy binaries against code inference attacks. BGDX enforces memory permissions on a byte-granular level allowing us to combine DCR and XOM for legacy, off-the-shelf binaries. Our evaluation shows that BGDX is not only effective, but highly efficient, imposing only a geometric mean performance overhead of 3.95% on SPEC.

CRJul 5, 2020
Static Detection of Uninitialized Stack Variables in Binary Code

Behrad Garmany, Martin Stoffel, Robert Gawlik et al.

More than two decades after the first stack smashing attacks, memory corruption vulnerabilities utilizing stack anomalies are still prevalent and play an important role in practice. Among such vulnerabilities, uninitialized variables play an exceptional role due to their unpleasant property of unpredictability: as compilers are tailored to operate fast, costly interprocedural analysis procedures are not used in practice to detect such vulnerabilities. As a result, complex relationships that expose uninitialized memory reads remain undiscovered in binary code. Recent vulnerability reports show the versatility on how uninitialized memory reads are utilized in practice, especially for memory disclosure and code execution. Research in recent years proposed detection and prevention techniques tailored to source code. To date, however, there has not been much attention for these types of software bugs within binary executables. In this paper, we present a static analysis framework to find uninitialized variables in binary executables. We developed methods to lift the binaries into a knowledge representation which builds the base for specifically crafted algorithms to detect uninitialized reads. Our prototype implementation is capable of detecting uninitialized memory errors in complex binaries such as web browsers and OS kernels, and we detected 7 novel bugs.

CRJul 5, 2020
Steroids for DOPed Applications: A Compiler for Automated Data-Oriented Programming

Jannik Pewny, Philipp Koppe, Thorsten Holz

The wide-spread adoption of system defenses such as the randomization of code, stack, and heap raises the bar for code-reuse attacks. Thus, attackers utilize a scripting engine in target programs like a web browser to prepare the code-reuse chain, e.g., relocate gadget addresses or perform a just-in-time gadget search. However, many types of programs do not provide such an execution context that an attacker can use. Recent advances in data-oriented programming (DOP) explored an orthogonal way to abuse memory corruption vulnerabilities and demonstrated that an attacker can achieve Turing-complete computations without modifying code pointers in applications. As of now, constructing DOP exploits requires a lot of manual work. In this paper, we present novel techniques to automate the process of generating DOP exploits. We implemented a compiler called Steroids that compiles our high-level language SLANG into low-level DOP data structures driving malicious computations at run time. This enables an attacker to specify her intent in an application- and vulnerability-independent manner to maximize reusability. We demonstrate the effectiveness of our techniques and prototype implementation by specifying four programs of varying complexity in SLANG that calculate the Levenshtein distance, traverse a pointer chain to steal a private key, relocate a ROP chain, and perform a JIT-ROP attack. Steroids compiles each of those programs to low-level DOP data structures targeted at five different applications including GStreamer, Wireshark, and ProFTPd, which have vastly different vulnerabilities and DOP instances. Ultimately, this shows that our compiler is versatile, can be used for both 32- and 64-bit applications, works across bug classes, and enables highly expressive attacks without conventional code-injection or code-reuse techniques in applications lacking a scripting engine.

CRJul 5, 2020
Challenges in Designing Exploit Mitigations for Deeply Embedded Systems

Ali Abbasi, Jos Wetzels, Thorsten Holz et al.

Memory corruption vulnerabilities have been around for decades and rank among the most prevalent vulnerabilities in embedded systems. Yet this constrained environment poses unique design and implementation challenges that significantly complicate the adoption of common hardening techniques. Combined with the irregular and involved nature of embedded patch management, this results in prolonged vulnerability exposure windows and vulnerabilities that are relatively easy to exploit. Considering the sensitive and critical nature of many embedded systems, this situation merits significant improvement. In this work, we present the first quantitative study of exploit mitigation adoption in 42 embedded operating systems, showing the embedded world to significantly lag behind the general-purpose world. To improve the security of deeply embedded systems, we subsequently present μArmor, an approach to address some of the key gaps identified in our quantitative analysis. μArmor raises the bar for exploitation of embedded memory corruption vulnerabilities, while being adoptable on the short term without incurring prohibitive extra performance or storage costs.

CRApr 2, 2020
CORSICA: Cross-Origin Web Service Identification

Christian Dresen, Fabian Ising, Damian Poddebniak et al.

Vulnerabilities in private networks are difficult to detect for attackers outside of the network. While there are known methods for port scanning internal hosts that work by luring unwitting internal users to an external web page that hosts malicious JavaScript code, no such method for detailed and precise service identification is known. The reason is that the Same Origin Policy (SOP) prevents access to HTTP responses of other origins by default. We perform a structured analysis of loopholes in the SOP that can be used to identify web applications across network boundaries. For this, we analyze HTML5, CSS, and JavaScript features of standard-compliant web browsers that may leak sensitive information about cross-origin content. The results reveal several novel techniques, including leaking JavaScript function names or styles of cross-origin requests that are available in all common browsers. We implement and test these techniques in a tool called CORSICA. It can successfully identify 31 of 42 (74%) of web services running on different IoT devices as well as the version numbers of the four most widely used content management systems WordPress, Drupal, Joomla, and TYPO3. CORSICA can also determine the patch level on average down to three versions (WordPress), six versions (Drupal), two versions (Joomla), and four versions (TYPO3) with only ten requests on average. Furthermore, CORSICA is able to identify 48 WordPress plugins containing 65 vulnerabilities. Finally, we analyze mitigation strategies and show that the proposed but not yet implemented strategies Cross-Origin Resource Policy (CORP)} and Sec-Metadata would prevent our identification techniques.

CVMar 19, 2020
Leveraging Frequency Analysis for Deep Fake Image Recognition

Joel Frank, Thorsten Eisenhofer, Lea Schönherr et al.

Deep neural networks can generate images that are astonishingly realistic, so much so that it is often hard for humans to distinguish them from actual photos. These achievements have been largely made possible by Generative Adversarial Networks (GANs). While deep fake images have been thoroughly investigated in the image domain - a classical approach from the area of image forensics - an analysis in the frequency domain has been missing so far. In this paper, we address this shortcoming and our results reveal that in frequency space, GAN-generated images exhibit severe artifacts that can be easily identified. We perform a comprehensive analysis, showing that these artifacts are consistent across different neural network architectures, data sets, and resolutions. In a further investigation, we demonstrate that these artifacts are caused by upsampling operations found in all current GAN architectures, indicating a structural and fundamental problem in the way images are generated via GANs. Based on this analysis, we demonstrate how the frequency representation can be used to identify deep fake images in an automated way, surpassing state-of-the-art methods.

CRJan 28, 2020
Beyond the Front Page: Measuring Third Party Dynamics in the Field

Tobias Urban, Martin Degeling, Thorsten Holz et al.

In the modern Web, service providers often rely heavily on third parties to run their services. For example, they make use of ad networks to finance their services, externally hosted libraries to develop features quickly, and analytics providers to gain insights into visitor behavior. For security and privacy, website owners need to be aware of the content they provide their users. However, in reality, they often do not know which third parties are embedded, for example, when these third parties request additional content as it is common in real-time ad auctions. In this paper, we present a large-scale measurement study to analyze the magnitude of these new challenges. To better reflect the connectedness of third parties, we measured their relations in a model we call third party trees, which reflects an approximation of the loading dependencies of all third parties embedded into a given website. Using this concept, we show that including a single third party can lead to subsequent requests from up to eight additional services. Furthermore, our findings indicate that the third parties embedded on a page load are not always deterministic, as 50% of the branches in the third party trees change between repeated visits. In addition, we found that 93% of the analyzed websites embedded third parties that are located in regions that might not be in line with the current legal framework. Our study also replicates previous work that mostly focused on landing pages of websites. We show that this method is only able to measure a lower bound as subsites show a significant increase of privacy-invasive techniques. For example, our results show an increase of used cookies by about 36% when crawling websites more deeply.

CROct 1, 2019
Reverse Engineering x86 Processor Microcode

Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak et al.

Microcode is an abstraction layer on top of the physical components of a CPU and present in most general-purpose CPUs today. In addition to facilitate complex and vast instruction sets, it also provides an update mechanism that allows CPUs to be patched in-place without requiring any special hardware. While it is well-known that CPUs are regularly updated with this mechanism, very little is known about its inner workings given that microcode and the update mechanism are proprietary and have not been throughly analyzed yet. In this paper, we reverse engineer the microcode semantics and inner workings of its update mechanism of conventional COTS CPUs on the example of AMD's K8 and K10 microarchitectures. Furthermore, we demonstrate how to develop custom microcode updates. We describe the microcode semantics and additionally present a set of microprograms that demonstrate the possibilities offered by this technology. To this end, our microprograms range from CPU-assisted instrumentation to microcoded Trojans that can even be reached from within a web browser and enable remote code execution and cryptographic implementation attacks.

HCSep 5, 2019
(Un)informed Consent: Studying GDPR Consent Notices in the Field

Christine Utz, Martin Degeling, Sascha Fahl et al.

Since the adoption of the General Data Protection Regulation (GDPR) in May 2018 more than 60 % of popular websites in Europe display cookie consent notices to their visitors. This has quickly led to users becoming fatigued with privacy notifications and contributed to the rise of both browser extensions that block these banners and demands for a solution that bundles consent across multiple websites or in the browser. In this work, we identify common properties of the graphical user interface of consent notices and conduct three experiments with more than 80,000 unique users on a German website to investigate the influence of notice position, type of choice, and content framing on consent. We find that users are more likely to interact with a notice shown in the lower (left) part of the screen. Given a binary choice, more users are willing to accept tracking compared to mechanisms that require them to allow cookie use for each category or company individually. We also show that the wide-spread practice of nudging has a large effect on the choices users make. Our experiments show that seemingly small implementation decisions can substantially impact whether and how people interact with consent notices. Our findings demonstrate the importance for regulation to not just require consent, but also provide clear requirements or guidance for how this consent has to be obtained in order to ensure that users can make free and informed choices.

CRAug 5, 2019
Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems

Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler et al.

Automatic speech recognition (ASR) systems can be fooled via targeted adversarial examples, which induce the ASR to produce arbitrary transcriptions in response to altered audio signals. However, state-of-the-art adversarial examples typically have to be fed into the ASR system directly, and are not successful when played in a room. The few published over-the-air adversarial examples fall into one of three categories: they are either handcrafted examples, they are so conspicuous that human listeners can easily recognize the target transcription once they are alerted to its content, or they require precise information about the room where the attack takes place, and are hence not transferable to other rooms. In this paper, we demonstrate the first algorithm that produces generic adversarial examples, which remain robust in an over-the-air attack that is not adapted to the specific environment. Hence, no prior knowledge of the room characteristics is required. Instead, we use room impulse responses (RIRs) to compute robust adversarial examples for arbitrary room characteristics and employ the ASR system Kaldi to demonstrate the attack. Further, our algorithm can utilize psychoacoustic methods to hide changes of the original audio signal below the human thresholds of hearing. In practical experiments, we show that the adversarial examples work for varying room setups, and that no direct line-of-sight between speaker and microphone is necessary. As a result, an attacker can create inconspicuous adversarial examples for any target transcription and apply these to arbitrary room setups without any prior knowledge.

CRJul 3, 2019
Towards Automated Application-Specific Software Stacks

Nicolai Davidsson, Andre Pawlowski, Thorsten Holz

Software complexity has increased over the years. One common way to tackle this complexity during development is to encapsulate features into a shared library. This allows developers to reuse already implemented features instead of reimplementing them over and over again. However, not all features provided by a shared library are actually used by an application. As a result, an application using shared libraries loads unused code into memory, which an attacker can use to perform code-reuse and similar types of attacks. The same holds for applications written in a scripting language such as PHP or Ruby: The interpreter typically offers much more functionality than is actually required by the application and hence provides a larger overall attack surface. In this paper, we tackle this problem and propose a first step towards automated application-specific software stacks. We present a compiler extension capable of removing unneeded code from shared libraries and---with the help of domain knowledge---also capable of removing unused functionalities from an interpreter's code base during the compilation process. Our evaluation against a diverse set of real-world applications, among others Nginx, Lighttpd, and the PHP interpreter, removes on average 71.3% of the code in musl-libc, a popular libc implementation. The evaluation on web applications show that a tailored PHP interpreter can mitigate entire vulnerability classes, as is the case for OpenConf. We demonstrate the applicability of our debloating approach by creating an application-specific software stack for a Wordpress web application: we tailor the libc library to the Nginx web server and PHP interpreter, whereas the PHP interpreter is tailored to the Wordpress web application. In this real-world scenario, the code of the libc is decreased by 65.1% in total, thereby reducing the available code for code-reuse attacks.

CRFeb 22, 2019
A Study of Newly Observed Hostnames and DNS Tunneling in the Wild

Dennis Tatang, Florian Quinkert, Nico Dolecki et al.

The domain name system (DNS) is a crucial backbone of the Internet and millions of new domains are created on a daily basis. While the vast majority of these domains are legitimate, adversaries also register new hostnames to carry out nefarious purposes, such as scams, phishing, or other types of attacks. In this paper, we present insights on the global utilization of DNS through a measurement study examining exclusively newly observed hostnames via passive DNS data analysis. We analyzed more than two billion such hostnames collected over a period of two months. Surprisingly, we find that only three second-level domains are responsible for more than half of all newly observed hostnames every day. More specifically, we found that Google's Accelerated Mobile Pages (AMP) project, the music streaming service Spotify, and a DNS tunnel provider generate the majority of new domains on the Internet. DNS tunneling is a covert channel technique to transfer arbitrary information over DNS via DNS queries and answers. This technique is often (ab)used by attackers to transfer data in a stealthy way, bypassing traditional network security systems. We find that potential DNS tunnels cause a significant fraction of the global DNS requests for new hostnames: our analysis reveals that nearly all resource record type NULL requests and more than a third of all TXT requests can be attributed to DNS tunnels. Motivated by these empirical measurement results, we propose and implement a method to identify DNS tunnels via a step-wise filtering approach that relies on general characteristics of such tunnels (e.g., number of subdomains or resource record type). Using our approach on empirical data, we successfully identified 273 suspicious domains related to DNS tunnels, including two known APT campaigns (Wekby and APT32).

CRNov 21, 2018
The Unwanted Sharing Economy: An Analysis of Cookie Syncing and User Transparency under GDPR

Tobias Urban, Dennis Tatang, Martin Degeling et al.

The European General Data Protection Regulation (GDPR), which went into effect in May 2018, leads to important changes in this area: companies are now required to ask for users' consent before collecting and sharing personal data and by law users now have the right to gain access to the personal information collected about them. In this paper, we study and evaluate the effect of the GDPR on the online advertising ecosystem. In a first step, we measure the impact of the legislation on the connections (regarding cookie syncing) between third-parties and show that the general structure how the entities are arranged is not affected by the GDPR. However, we find that the new regulation has a statistically significant impact on the number of connections, which shrinks by around 40%. Furthermore, we analyze the right to data portability by evaluating the subject access right process of popular companies in this ecosystem and observe differences between the processes implemented by the companies and how they interpret the new legislation. We exercised our right of access under GDPR with 36 companies that had tracked us online. Although 32 companies (89%) we inquired replied within the period defined by law, only 21 (58%) finished the process by the deadline set in the GDPR. Our work has implications regarding the implementation of privacy law as well as what online tracking companies should do to be more compliant with the new regulation.

CRAug 16, 2018
Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding

Lea Schönherr, Katharina Kohls, Steffen Zeiler et al.

Voice interfaces are becoming accepted widely as input methods for a diverse set of devices. This development is driven by rapid improvements in automatic speech recognition (ASR), which now performs on par with human listening in many tasks. These improvements base on an ongoing evolution of DNNs as the computational core of ASR. However, recent research results show that DNNs are vulnerable to adversarial perturbations, which allow attackers to force the transcription into a malicious output. In this paper, we introduce a new type of adversarial examples based on psychoacoustic hiding. Our attack exploits the characteristics of DNN-based ASR systems, where we extend the original analysis procedure by an additional backpropagation step. We use this backpropagation to learn the degrees of freedom for the adversarial perturbation of the input signal, i.e., we apply a psychoacoustic model and manipulate the acoustic signal below the thresholds of human perception. To further minimize the perceptibility of the perturbations, we use forced alignment to find the best fitting temporal alignment between the original audio sample and the malicious target transcription. These extensions allow us to embed an arbitrary audio input with a malicious voice command that is then transcribed by the ASR system, with the audio signal remaining barely distinguishable from the original signal. In an experimental evaluation, we attack the state-of-the-art speech recognition system Kaldi and determine the best performing parameter and analysis setup for different types of input. Our results show that we are successful in up to 98% of cases with a computational effort of fewer than two minutes for a ten-second audio file. Based on user studies, we found that none of our target transcriptions were audible to human listeners, who still understand the original speech content with unchanged accuracy.

CRMar 5, 2018
RAPTOR: Ransomware Attack PredicTOR

Florian Quinkert, Thorsten Holz, KSM Tozammel Hossain et al.

Ransomware, a type of malicious software that encrypts a victim's files and only releases the cryptographic key once a ransom is paid, has emerged as a potentially devastating class of cybercrimes in the past few years. In this paper, we present RAPTOR, a promising line of defense against ransomware attacks. RAPTOR fingerprints attackers' operations to forecast ransomware activity. More specifically, our method learns features of malicious domains by looking at examples of domains involved in known ransomware attacks, and then monitors newly registered domains to identify potentially malicious ones. In addition, RAPTOR uses time series forecasting techniques to learn models of historical ransomware activity and then leverages malicious domain registrations as an external signal to forecast future ransomware activity. We illustrate RAPTOR's effectiveness by forecasting all activity stages of Cerber, a popular ransomware family. By monitoring zone files of the top-level domain .top starting from August 30, 2016 through May 31, 2017, RAPTOR predicted 2,126 newly registered domains to be potential Cerber domains. Of these, 378 later actually appeared in blacklists. Our empirical evaluation results show that using predicted domain registrations helped improve forecasts of future Cerber activity. Most importantly, our approach demonstrates the value of fusing different signals in forecasting applications in the cyber domain.

CRDec 8, 2017
An Empirical Study on Price Differentiation Based on System Fingerprints

Thomas Hupperich, Dennis Tatang, Nicolai Wilkop et al.

Price differentiation describes a marketing strategy to determine the price of goods on the basis of a potential customer's attributes like location, financial status, possessions, or behavior. Several cases of online price differentiation have been revealed in recent years. For example, different pricing based on a user's location was discovered for online office supply chain stores and there were indications that offers for hotel rooms are priced higher for Apple users compared to Windows users at certain online booking websites. One potential source for relevant distinctive features are \emph{system fingerprints}, i.\,e., a technique to recognize users' systems by identifying unique attributes such as the source IP address or system configuration. In this paper, we shed light on the ecosystem of pricing at online platforms and aim to detect if and how such platform providers make use of price differentiation based on digital system fingerprints. We designed and implemented an automated price scanner capable of disguising itself as an arbitrary system, leveraging real-world system fingerprints, and searched for price differences related to different features (e.\,g., user location, language setting, or operating system). This system allows us to explore price differentiation cases and expose those characteristic features of a system that may influence a product's price.

CROct 24, 2017
On Security Research Towards Future Mobile Network Generations

David Rupprecht, Adrian Dabrowski, Thorsten Holz et al.

Over the last decades, numerous security and privacy issues in all three active mobile network generations have been revealed that threaten users as well as network providers. In view of the newest generation (5G) currently under development, we now have the unique opportunity to identify research directions for the next generation based on existing security and privacy issues as well as already proposed defenses. This paper aims to unify security knowledge on mobile phone networks into a comprehensive overview and to derive pressing open research questions. To achieve this systematically, we develop a methodology that categorizes known attacks by their aim, proposed defenses, underlying causes, and root causes. Further, we assess the impact and the efficacy of each attack and defense. We then apply this methodology to existing literature on attacks and defenses in all three network generations. By doing so, we identify ten causes and four root causes of attacks. Mapping the attacks to proposed defenses and suggestions for the 5G specification enables us to uncover open research questions and challenges for the development of next-generation mobile networks. The problems of unsecured pre-authentication traffic and jamming attacks exist across all three mobile generations. They should be addressed in the future, in particular, to wipe out the class of downgrade attacks and, thereby, strengthen the users' privacy. Further advances are needed in the areas of inter-operator protocols as well as secure baseband implementations. Additionally, mitigations against denial-of-service attacks by smart protocol design represent an open research question.