89.1LGMay 29
Geometric Erasure by Contrastive Velocity Matching in Rectified FlowsJonas Henry Grebe, Tobias Braun, Anna Rohrbach et al.
While the rapid adoption of multimodal generative models offers immense potential, it has also increased the risks of harmful content synthesis, deepfakes, and copyright infringements. To address these challenges, concept erasure has emerged as a prospective safeguard. However, as the field gradually transitions from U-Net-based diffusion models to Rectified Flow Transformers, erasure research has struggled to keep pace. In this work, we introduce GEM, a simple but highly effective erasure framework for Rectified Flow models. As part of our contribution, we establish a principled bridge between trajectory-based unlearning grounded in Generative Flow Networks and classic teacher-guided erasure: we translate trajectory-based signals into a teacher-guided flow-matching setup that unifies the strengths of both paradigms. Concretely, a teacher provides complementary attraction and repulsion signals that we combine into a single geometric guidance objective, yielding targeted suppression of unwanted concepts while preserving benign generation.
SYJul 11, 2023
Realtime Spectrum Monitoring via Reinforcement Learning -- A Comparison Between Q-Learning and Heuristic MethodsTobias Braun, Tobias Korzyzkowske, Larissa Putzar et al.
Due to technological advances in the field of radio technology and its availability, the number of interference signals in the radio spectrum is continuously increasing. Interference signals must be detected in a timely fashion, in order to maintain standards and keep emergency frequencies open. To this end, specialized (multi-channel) receivers are used for spectrum monitoring. In this paper, the performances of two different approaches for controlling the available receiver resources are compared. The methods used for resource management (ReMa) are linear frequency tuning as a heuristic approach and a Q-learning algorithm from the field of reinforcement learning. To test the methods to be investigated, a simplified scenario was designed with two receiver channels monitoring ten non-overlapping frequency bands with non-uniform signal activity. For this setting, it is shown that the Q-learning algorithm used has a significantly higher detection rate than the heuristic approach at the expense of a smaller exploration rate. In particular, the Q-learning approach can be parameterized to allow for a suitable trade-off between detection and exploration rate.
90.8CRMay 19
Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive ModelsTobias Braun, Jonas Henry Grebe, Hossein Shakibania et al.
Unified autoregressive models (UAMs) are transformer models that generate text as well as image tokens within a single autoregressive pass. Shared parameters and a multimodal vocabulary simplify the training pipeline and facilitate flexible multimodal generation, yet might introduce new vulnerabilities. In particular, we are the first to show that this unified architecture enables multimodal backdoor attacks, where a trigger can propagate malicious effects across multiple output modalities. Specifically, we present the Token by Token Backdoor Attack (ToBAC), the first backdoor attack targeting UAMs, exploring both data-based and model-based poisoning strategies. We demonstrate that innocuous characters or even common words can be transformed into triggers that elicit harmful behavior in autoregressive image generation. ToBAC can jointly manipulate visual outputs and accompanying text, increasing the perceived authenticity of fabricated content. With model access, ToBAC enables attacks on the unified Liquid model in which a subtle word (e.g., ``cool'') induces modality-aligned brand promotion or ideological influence in 55% of generations. Without model access, ToBAC can be induced through data poisoning, achieving an average success rate of 63.1% against JanusPro.
LGMar 4
Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion ModelsZiyuan Chen, Yujin Jeong, Tobias Braun et al.
As text-to-image diffusion models become increasingly deployed in real-world applications, concerns about backdoor attacks have gained significant attention. Prior work on text-based backdoor attacks has largely focused on diffusion models conditioned on a single lightweight text encoder. However, more recent diffusion models that incorporate multiple large-scale text encoders remain underexplored in this context. Given the substantially increased number of trainable parameters introduced by multiple text encoders, an important question is whether backdoor attacks can remain both efficient and effective in such settings. In this work, we study Stable Diffusion 3, which uses three distinct text encoders and has not yet been systematically analyzed for text-encoder-based backdoor vulnerabilities. To understand the role of text encoders in backdoor attacks, we define four categories of attack targets and identify the minimal sets of encoders required to achieve effective performance for each attack objective. Based on this, we further propose Multi-Encoder Lightweight aTtacks (MELT), which trains only low-rank adapters while keeping the pretrained text encoder weight frozen. We demonstrate that tuning fewer than 0.2% of the total encoder parameters is sufficient for successful backdoor attacks on Stable Diffusion 3, revealing previously underexplored vulnerabilities in practical attack scenarios in multi-encoder settings.
CVDec 13, 2024
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal ExpertsTobias Braun, Mark Rothermel, Marcus Rohrbach et al.
The proliferation of disinformation demands reliable and scalable fact-checking solutions. We present Dynamic Evidence-based FAct-checking with Multimodal Experts (DEFAME), a modular, zero-shot MLLM pipeline for open-domain, text-image claim verification. DEFAME operates in a six-stage process, dynamically selecting the tools and search depth to extract and evaluate textual and visual evidence. Unlike prior approaches that are text-only, lack explainability, or rely solely on parametric knowledge, DEFAME performs end-to-end verification, accounting for images in claims and evidence while generating structured, multimodal reports. Evaluation on the popular benchmarks VERITE, AVerITeC, and MOCHEG shows that DEFAME surpasses all previous methods, establishing itself as the new state-of-the-art fact-checking system for uni- and multimodal fact-checking. Moreover, we introduce a new multimodal benchmark, ClaimReview2024+, featuring claims after the knowledge cutoff of GPT-4o, avoiding data leakage. Here, DEFAME drastically outperforms the GPT-4o baselines, showing temporal generalizability and the potential for real-time fact-checking.
CRApr 29, 2025
Erased but Not Forgotten: How Backdoors Compromise Concept ErasureJonas Henry Grebe, Tobias Braun, Marcus Rohrbach et al.
The expansion of large-scale text-to-image diffusion models has raised growing concerns about their potential to generate undesirable or harmful content, ranging from fabricated depictions of public figures to sexually explicit images. To mitigate these risks, prior work has devised machine unlearning techniques that attempt to erase unwanted concepts through fine-tuning. However, in this paper, we introduce a new threat model, Toxic Erasure (ToxE), and demonstrate how recent unlearning algorithms, including those explicitly designed for robustness, can be circumvented through targeted backdoor attacks. The threat is realized by establishing a link between a trigger and the undesired content. Subsequent unlearning attempts fail to erase this link, allowing adversaries to produce harmful content. We instantiate ToxE via two established backdoor attacks: one targeting the text encoder and another manipulating the cross-attention layers. Further, we introduce Deep Intervention Score-based Attack (DISA), a novel, deeper backdoor attack that optimizes the entire U-Net using a score-based objective, improving the attack's persistence across different erasure methods. We evaluate five recent concept erasure methods against our threat model. For celebrity identity erasure, our deep attack circumvents erasure with up to 82% success, averaging 57% across all erasure methods. For explicit content erasure, ToxE attacks can elicit up to 9 times more exposed body parts, with DISA yielding an average increase by a factor of 2.9. These results highlight a critical security gap in current unlearning strategies.
HCSep 20, 2021
Visually Connecting Historical Figures Through Event Knowledge GraphsShahid Latif, Shivam Agarwal, Simon Gottschalk et al.
Knowledge graphs store information about historical figures and their relationships indirectly through shared events. We developed a visualization system, VisKonnect, for analyzing the intertwined lives of historical figures based on the events they participated in. A user's query is parsed for identifying named entities, and related data is retrieved from an event knowledge graph. While a short textual answer to the query is generated using the GPT-3 language model, various linked visualizations provide context, display additional information related to the query, and allow exploration.
CRJul 6, 2021
An Agnostic Domain Specific Language for Implementing Attacks in an Automotive Use CaseChristian Wolschke, Stefan Marksteiner, Tobias Braun et al.
This paper presents a Domain Specific Language (DSL) for generically describing cyber attacks, agnostic to specific system-under-test(SUT). The creation of the presented DSL is motivated by an automotive use case. The concepts of the DSL are generic such thatattacks on arbitrary systems can be addressed.The ongoing trend to improve the user experience of vehicles with connected services implies an enhanced connectivity as well asremote accessible interface opens potential attack vectors. This might also impact safety and the proprietary nature of potential SUTs.Reusing tests of attack vectors to industrialize testing them on multiple SUTs mandates an abstraction mechanism to port an attackfrom one system to another. The DSL therefore generically describes attacks for the usage with a test case generator (and executionenvironment) also described in this paper. The latter use this description and a database with SUT-specific information to generateattack implementations for a multitude of different (automotive) SUTs.
CRJun 25, 2021
SaSeVAL: A Safety/Security-Aware Approach for Validation of Safety-Critical SystemsChristian Wolschke, Behrooz Sangchoolie, Jacob Simon et al.
Increasing communication and self-driving capabilities for road vehicles lead to threats imposed by attackers. Especially attacks leading to safety violations have to be identified to address them by appropriate measures. The impact of an attack depends on the threat exploited, potential countermeasures and the traffic situation. In order to identify such attacks and to use them for testing, we propose the systematic approach SaSeVAL for deriving attacks of autonomous vehicles. SaSeVAL is based on threats identification and safety-security analysis. The impact of automotive use cases to attacks is considered. The threat identification considers the attack interface of vehicles and classifies threat scenarios according to threat types, which are then mapped to attack types. The safety-security analysis identifies the necessary requirements which have to be tested based on the architecture of the system under test. lt determines which safety impact a security violation may have, and in which traffic situations the highest impact is expected. Finally, the results of threat identification and safety-security analysis are used to describe attacks. The goal of SaSeVAL is to achieve safety validation of the vehicle w.r.t. security concerns. lt traces safety goals to threats and to attacks explicitly. Hence, the coverage of safety concerns by security testing is assured. Two use cases of vehicle communication and autonomous driving are investigated to prove the applicability of the approach.