NAMar 8, 2018
MULTIBAT: Unified workflow for fast electrochemical 3D simulations of lithium-ion cells combining virtual stochastic microstructures, electrochemical degradation models and model order reductionJulian Feinauer, Simon Hein, Stephan Rave et al.
We present a simulation workflow for efficient investigations of the interplay between 3D lithium-ion electrode microstructures and electrochemical performance, with emphasis on lithium plating. Our approach addresses several challenges. First, the 3D microstructures of porous electrodes are generated by a parametric stochastic model, in order to significantly reduce the necessity of tomographic imaging. Secondly, we integrate a consistent microscopic, 3D spatially-resolved physical model for the electrochemical behavior of the lithium-ion cells taking lithium plating and stripping into account. This highly non-linear mathematical model is solved numerically on the complex 3D microstructures to compute the transient cell behavior. Due to the complexity of the model and the considerable size of realistic microstructures even a single charging cycle of the battery requires several hours computing time. This renders large scale parameter studies extremely time consuming. Hence, we develop a mathematical model order reduction scheme. We demonstrate how these aspects are integrated into one unified workflow, which is a step towards computer aided engineering for the development of more efficient lithium-ion cells.
CVSep 11, 2023
Stream-based Active Learning by Exploiting Temporal Properties in Perception with Temporal Predicted LossSebastian Schmidt, Stephan Günnemann
Active learning (AL) reduces the amount of labeled data needed to train a machine learning model by intelligently choosing which instances to label. Classic pool-based AL requires all data to be present in a datacenter, which can be challenging with the increasing amounts of data needed in deep learning. However, AL on mobile devices and robots, like autonomous cars, can filter the data from perception sensor streams before reaching the datacenter. We exploited the temporal properties for such image streams in our work and proposed the novel temporal predicted loss (TPL) method. To evaluate the stream-based setting properly, we introduced the GTA V streets and the A2D2 streets dataset and made both publicly available. Our experiments showed that our approach significantly improves the diversity of the selection while being an uncertainty-based method. As pool-based approaches are more common in perception applications, we derived a concept for comparing pool-based and stream-based AL, where TPL out-performed state-of-the-art pool- or stream-based approaches for different models. TPL demonstrated a gain of 2.5 precept points (pp) less required data while being significantly faster than pool-based methods.
12.1LGMar 25
Amplified Patch-Level Differential Privacy for Free via Random CroppingKaan Durmaz, Jan Schuchardt, Sebastian Schmidt et al.
Random cropping is one of the most common data augmentation techniques in computer vision, yet the role of its inherent randomness in training differentially private machine learning models has thus far gone unexplored. We observe that when sensitive content in an image is spatially localized, such as a face or license plate, random cropping can probabilistically exclude that content from the model's input. This introduces a third source of stochasticity in differentially private training with stochastic gradient descent, in addition to gradient noise and minibatch sampling. This additional randomness amplifies differential privacy without requiring changes to model architecture or training procedure. We formalize this effect by introducing a patch-level neighboring relation for vision data and deriving tight privacy bounds for differentially private stochastic gradient descent (DP-SGD) when combined with random cropping. Our analysis quantifies the patch inclusion probability and shows how it composes with minibatch sampling to yield a lower effective sampling rate. Empirically, we validate that patch-level amplification improves the privacy-utility trade-off across multiple segmentation architectures and datasets. Our results demonstrate that aligning privacy accounting with domain structure and additional existing sources of randomness can yield stronger guarantees at no additional cost.
CLAug 19, 2024
Active Learning for Identifying Disaster-Related Tweets: A Comparison with Keyword Filtering and Generic Fine-TuningDavid Hanny, Sebastian Schmidt, Bernd Resch
Information from social media can provide essential information for emergency response during natural disasters in near real-time. However, it is difficult to identify the disaster-related posts among the large amounts of unstructured data available. Previous methods often use keyword filtering, topic modelling or classification-based techniques to identify such posts. Active Learning (AL) presents a promising sub-field of Machine Learning (ML) that has not been used much in the field of text classification of social media content. This study therefore investigates the potential of AL for identifying disaster-related Tweets. We compare a keyword filtering approach, a RoBERTa model fine-tuned with generic data from CrisisLex, a base RoBERTa model trained with AL and a fine-tuned RoBERTa model trained with AL regarding classification performance. For testing, data from CrisisLex and manually labelled data from the 2021 flood in Germany and the 2023 Chile forest fires were considered. The results show that generic fine-tuning combined with 10 rounds of AL outperformed all other approaches. Consequently, a broadly applicable model for the identification of disaster-related Tweets could be trained with very little labelling effort. The model can be applied to use cases beyond this study and provides a useful tool for further research in social media analysis.
16.2LGMay 19
Smooth Piecewise Cutting for Neural Operator to Handle Discontinuities and Sharp TransitionsHa Dang, Sebastian Schmidt, Juergen Hesser
Neural operators have achieved strong performance in learning solution operators of partial differential equations (PDEs), but their inherently continuous representations struggle to capture discontinuities and sharp transitions. Existing approaches typically approximate such features within continuous function spaces, often requiring increased model capacity and high-resolution data. In this work, we propose Cut-DeepONet, a two-stage training framework that explicitly models discontinuities while reducing learning complexity. Our approach reformulates the problem via a lifting strategy, partitioning the domain into smooth subregions while representing discontinuities as boundaries in a higher-dimensional space. This separation aligns the operator learning task with the inductive bias of neural networks and avoids directly approximating discontinuities. An additional network predicts input-dependent discontinuity locations for unseen inputs, which are then used to guide the neural operator in generating smooth components within each region. Experiments on benchmark PDEs show that Cut-DeepONet outperforms state-of-the-art methods, even when trained on low-resolution datasets. The method excels on problems with discontinuities and sharp transitions, while using fewer trainable parameters. Our results highlight the benefits of changing the representation of operator learning rather than increasing model complexity.
9.3CEMay 19
Building Acoustics 01: Finite Element Model of an Building Acoustics Test Facility to Predict the Sound Transmission Loss Based on DIN EN ISO 10140Sebastian Schmidt, Sabine C. Langer
In the context of building acoustics, sound transmission loss estimations are crucial to quantify the noise pollution in buildings. When developing building prototypes in the sense of an acoustic-oriented design process, it is desirable to have an virtual prototype, especially in early development stages, to estimate, for instance, the influence of different material or geometry configurations on to the sound transmission loss. This contribution aims to present a simple virtual prototype of an building acoustics test facility in accordance with DIN EN ISO 10140 for the measurement of the sound transmission loss of single- and double-leaf walls with and without insulation. Here, the finite element method is used as the numerical modelling method of choice. In the course of this, geometry and mesh creation was done using SALOME 9.14 whereas the institute's in-house research code elPaSo was utilised for the matrix assembly and solving procedure. At first, elPaSo was verified by the commercial software COMSOL 6.3 considering a small-scale test facility. Afterwards, the large-scale test facility finite element model was created using a frequency- and domain-specific discretisation approach. The sound transmission loss of three different test specimens was estimated in one-third-octave bands from 8 Hz to 630 Hz, where the double-leaf wall with insulation exhibited good agreement to the theoretical sound transmission loss profile from literature.
36.0CVApr 22Code
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous DrivingFinn Rasmus Schäfer, Yuan Gao, Dingrui Wang et al.
While Vision-Language Models (VLMs) have advanced highlevel reasoning in autonomous driving, their ability to ground this reasoning in the underlying physics of ego-motion remains poorly understood. We introduce EgoDyn-Bench, a diagnostic benchmark for evaluating the semantic ego-motion understanding of vision-centric foundation models. By mapping continuous vehicle kinematics to discrete motion concepts via a deterministic oracle, we decouple a model's internal physical logic from its visual perception. Our large-scale empirical audit spanning 20 + models, including closed-source MLLMs, open-source VLMs across multiple scales, and specialized VLAs, identifies a significant Perception Bottleneck: while models exhibit logical physical concepts, they consistently fail to accurately align them with visual observations, frequently underperforming classical non-learned geometric baselines. This failure persists across model scales and domain-specific training, indicating a structural deficit in how current architectures couple visual perception with physical reasoning. We demonstrate that providing explicit trajectory encodings substantially restores physical consistency across all evaluated models, revealing a functional disentanglement between vision and language: egomotion logic is derived almost exclusively from the language modality, while visual observations contribute negligible additional signal. This structural finding provides a standardized diagnostic framework and a practical pathway toward physically aligned embodied AI. Keywords: Ego-motion - Physical Reasoning - Foundation Models
3.9DSMay 13
The Power of Graph Doubling: Computing Ultrabubbles in a Bidirected Graph by Reducing to Weak SuperbubblesSebastian Schmidt, Juha Harviainen, Corentin Moumard et al.
Bidirected graphs are a common generalisation of directed graphs where arcs can also be incoming to both their incident nodes, or outgoing from both their incident nodes. Such arcs allow a walk to change direction. Some algorithms can easily be adapted from directed graphs to bidirected graphs, such as shortest path algorithms. These adaptions are already used in practice, and implicitly use the graph doubling technique to apply an algorithm for directed graphs to bidirected graphs. In other cases, the applicability of graph doubling is not that obvious. For example, superbubbles and their generalisation to bidirected graphs ultrabubbles. Ultrabubbles are a common structure in bidirected biological graphs which carries biological meaning, but also functions as a nested clustering method, since an ultrabubble is separated by only two nodes from the rest of the graph. There is an existing method that enumerates a structure similar to ultrabubbles by enumerating (weak) superbubbles in the doubled graph. However, the literature does not make any direct connection between superbubbles and ultrabubbles except that a superbubble is an ultrabubble in a directed graph. Only a partial result connecting superbubbles and ultrabubbles exists by Harviainen et al. (2026). Graph doubling on the other hand maintains connectivity, and allows to draw a direct connection between ultrabubbles and weak superbubbles. This results in the first linear-time reduction-based algorithm for computing ultrabubbles on any bidirected graph. Together with the fact that graph doubling is already used implicitly in simple cases, our result motivates that graph doubling is a powerful yet simple technique to apply algorithms for directed graphs to bidirected graphs.
IRFeb 7, 2024
Detecting Generated Native Ads in Conversational SearchSebastian Schmidt, Ines Zelch, Janek Bevendorff et al.
Conversational search engines such as YouChat and Microsoft Copilot use large language models (LLMs) to generate responses to queries. It is only a small step to also let the same technology insert ads within the generated responses - instead of separately placing ads next to a response. Inserted ads would be reminiscent of native advertising and product placement, both of which are very effective forms of subtle and manipulative advertising. Considering the high computational costs associated with LLMs, for which providers need to develop sustainable business models, users of conversational search engines may very well be confronted with generated native ads in the near future. In this paper, we thus take a first step to investigate whether LLMs can also be used as a countermeasure, i.e., to block generated native ads. We compile the Webis Generated Native Ads 2024 dataset of queries and generated responses with automatically inserted ads, and evaluate whether LLMs or fine-tuned sentence transformers can detect the ads. In our experiments, the investigated LLMs struggle with the task but sentence transformers achieve precision and recall values above 0.9.
CVMay 18, 2024
A Unified Approach Towards Active Learning and Out-of-Distribution DetectionSebastian Schmidt, Leonard Schenk, Leo Schwinn et al.
When applying deep learning models in open-world scenarios, active learning (AL) strategies are crucial for identifying label candidates from a nearly infinite amount of unlabeled data. In this context, robust out-of-distribution (OOD) detection mechanisms are essential for handling data outside the target distribution of the application. However, current works investigate both problems separately. In this work, we introduce SISOM as the first unified solution for both AL and OOD detection. By leveraging feature space distance metrics SISOM combines the strengths of the currently independent tasks to solve both effectively. We conduct extensive experiments showing the problems arising when migrating between both tasks. In these evaluations SISOM underlined its effectiveness by achieving first place in two of the widely used OpenOOD benchmarks and second place in the remaining one. In AL, SISOM outperforms others and delivers top-1 performance in three benchmarks
CVMar 4, 2025
Joint Out-of-Distribution Filtering and Data Discovery Active LearningSebastian Schmidt, Leonard Schenk, Leo Schwinn et al.
As the data demand for deep learning models increases, active learning (AL) becomes essential to strategically select samples for labeling, which maximizes data efficiency and reduces training costs. Real-world scenarios necessitate the consideration of incomplete data knowledge within AL. Prior works address handling out-of-distribution (OOD) data, while another research direction has focused on category discovery. However, a combined analysis of real-world considerations combining AL with out-of-distribution data and category discovery remains unexplored. To address this gap, we propose Joint Out-of-distribution filtering and data Discovery Active learning (Joda) , to uniquely address both challenges simultaneously by filtering out OOD data before selecting candidates for labeling. In contrast to previous methods, we deeply entangle the training procedure with filter and selection to construct a common feature space that aligns known and novel categories while separating OOD samples. Unlike previous works, Joda is highly efficient and completely omits auxiliary models and training access to the unlabeled pool for filtering or selection. In extensive experiments on 18 configurations and 3 metrics, \ours{} consistently achieves the highest accuracy with the best class discovery to OOD filtering balance compared to state-of-the-art competitor approaches.
12.3DSApr 9
Identifying bubble-like subgraphs in linear-time via a unified SPQR-tree frameworkFrancisco Sena, Aleksandr Politov, Corentin Moumard et al.
A fundamental algorithmic problem in computational biology is to find all subgraphs of a given type (superbubbles, snarls, and ultrabubbles) in a directed or bidirected input graph. These correspond to regions of genetic variation and are useful in analyzing collections of genomes. We present the first linear-time algorithms for identifying all snarls and all ultrabubbles, resolving problems open since 2018. The algorithm for snarls relies on a new linear-size representation of all snarls with respect to the number of vertices in the graph. We employ the well-known SPQR-tree decomposition, which encodes all 2-separators of a biconnected graph. After several dynamic-programming-style traversals of this tree, we maintain key properties (such as acyclicity) that allow us to decide whether a given 2-separator defines a subgraph to be reported. A crucial ingredient for linear-time complexity is that acyclicity of linearly many subgraphs can be tested simultaneously via the problem of computing all arcs in a directed graph whose removal renders it acyclic (so-called feedback arcs). As such, we prove a fundamental result for bidirected graphs, that may be of independent interest: all feedback arcs can be computed in linear time for tipless bidirected graphs, while in general this is at least as hard as matrix multiplication, assuming the k-Clique Conjecture. Our results form a unified framework that also yields a completely different linear-time algorithm for finding all superbubbles. Although some of the results are technically involved, the underlying ideas are conceptually simple, and may extend to other bubble-like subgraphs. More broadly, our work contributes to the theoretical foundations of computational biology and advances a growing line of research that uses SPQR-tree decompositions as a general tool for designing efficient algorithms, beyond their traditional role in graph drawing.
AIMay 8, 2024
Developing trustworthy AI applications with foundation modelsMichael Mock, Sebastian Schmidt, Felix Müller et al.
The trustworthiness of AI applications has been the subject of recent research and is also addressed in the EU's recently adopted AI Regulation. The currently emerging foundation models in the field of text, speech and image processing offer completely new possibilities for developing AI applications. This whitepaper shows how the trustworthiness of an AI application developed with foundation models can be evaluated and ensured. For this purpose, the application-specific, risk-based approach for testing and ensuring the trustworthiness of AI applications, as developed in the 'AI Assessment Catalog - Guideline for Trustworthy Artificial Intelligence' by Fraunhofer IAIS, is transferred to the context of foundation models. Special consideration is given to the fact that specific risks of foundation models can have an impact on the AI application and must also be taken into account when checking trustworthiness. Chapter 1 of the white paper explains the fundamental relationship between foundation models and AI applications based on them in terms of trustworthiness. Chapter 2 provides an introduction to the technical construction of foundation models and Chapter 3 shows how AI applications can be developed based on them. Chapter 4 provides an overview of the resulting risks regarding trustworthiness. Chapter 5 shows which requirements for AI applications and foundation models are to be expected according to the draft of the European Union's AI Regulation and Chapter 6 finally shows the system and procedure for meeting trustworthiness requirements.
CVApr 7, 2025
Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic SegmentationSebastian Schmidt, Julius Körner, Dominik Fuchsgruber et al.
In panoptic segmentation, individual instances must be separated within semantic classes. As state-of-the-art methods rely on a pre-defined set of classes, they struggle with novel categories and out-of-distribution (OOD) data. This is particularly problematic in safety-critical applications, such as autonomous driving, where reliability in unseen scenarios is essential. We address the gap between outstanding benchmark performance and reliability by proposing Prior2Former (P2F), the first approach for segmentation vision transformers rooted in evidential learning. P2F extends the mask vision transformer architecture by incorporating a Beta prior for computing model uncertainty in pixel-wise binary mask assignments. This design enables high-quality uncertainty estimation that effectively detects novel and OOD objects enabling state-of-the-art anomaly instance segmentation and open-world panoptic segmentation. Unlike most segmentation models addressing unknown classes, P2F operates without access to OOD data samples or contrastive training on void (i.e., unlabeled) classes, making it highly applicable in real-world scenarios where such prior information is unavailable. Additionally, P2F can be flexibly applied to anomaly instance and panoptic segmentation. Through comprehensive experiments on the Cityscapes, COCO, SegmentMeIfYouCan, and OoDIS datasets, P2F demonstrates state-of-the-art performance across the board.
CLMar 19, 2025
EmoGRACE: Aspect-based emotion analysis for social media dataChristina Zorenböhmer, Sebastian Schmidt, Bernd Resch
While sentiment analysis has advanced from sentence to aspect-level, i.e., the identification of concrete terms related to a sentiment, the equivalent field of Aspect-based Emotion Analysis (ABEA) is faced with dataset bottlenecks and the increased complexity of emotion classes in contrast to binary sentiments. This paper addresses these gaps, by generating a first ABEA training dataset, consisting of 2,621 English Tweets, and fine-tuning a BERT-based model for the ABEA sub-tasks of Aspect Term Extraction (ATE) and Aspect Emotion Classification (AEC). The dataset annotation process was based on the hierarchical emotion theory by Shaver et al. [1] and made use of group annotation and majority voting strategies to facilitate label consistency. The resulting dataset contained aspect-level emotion labels for Anger, Sadness, Happiness, Fear, and a None class. Using the new ABEA training dataset, the state-of-the-art ABSA model GRACE by Luo et al. [2] was fine-tuned for ABEA. The results reflected a performance plateau at an F1-score of 70.1% for ATE and 46.9% for joint ATE and AEC extraction. The limiting factors for model performance were broadly identified as the small training dataset size coupled with the increased task complexity, causing model overfitting and limited abilities to generalize well on new data.
CVNov 27, 2025
Unexplored flaws in multiple-choice VQA evaluationsFabio Rosenthal, Sebastian Schmidt, Thorsten Graf et al.
Multimodal Large Language Models (MLLMs) demonstrate strong capabilities in handling image-text inputs. A common way to assess this ability is through multiple-choice Visual Question Answering (VQA). Earlier works have already revealed that these benchmarks are sensitive to answer choice order, a limitation that can be mitigated through careful design. Yet, we highlight additional, unexplored biases in prompt formatting that question the reliability of current MLLM evaluations. Specifically, we identify three key variation factors in prompt formatting and analyze their impact through a large-scale study involving $\mathbf{\text{seven}}$ MLLMs and $\mathbf{\text{five}}$ VQA datasets, spanning $\mathbf{48}$ distinct $\mathbf{\text{prompt format variations}}$. Our findings reveal that multiple-choice VQA is highly sensitive to minor prompt format changes, even when these changes are semantically neutral. We further demonstrate that these biases persist independently of known order biases or the MLLM's confidence in the correct answer. Finally, we demonstrate that existing bias mitigation strategies fail to address these newly identified biases.
CVOct 25, 2025
GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image GenerationPhillip Mueller, Talip Uenlue, Sebastian Schmidt et al.
Precise geometric control in image generation is essential for engineering \& product design and creative industries to control 3D object features accurately in image space. Traditional 3D editing approaches are time-consuming and demand specialized skills, while current image-based generative methods lack accuracy in geometric conditioning. To address these challenges, we propose GeoDiffusion, a training-free framework for accurate and efficient geometric conditioning of 3D features in image generation. GeoDiffusion employs a class-specific 3D object as a geometric prior to define keypoints and parametric correlations in 3D space. We ensure viewpoint consistency through a rendered image of a reference 3D object, followed by style transfer to meet user-defined appearance specifications. At the core of our framework is GeoDrag, improving accuracy and speed of drag-based image editing on geometry guidance tasks and general instructions on DragBench. Our results demonstrate that GeoDiffusion enables precise geometric modifications across various iterative design workflows.
CVOct 12, 2025
A Machine Learning Perspective on Automated Driving Corner CasesSebastian Schmidt, Julius Körner, Stephan Günnemann
For high-stakes applications, like autonomous driving, a safe operation is necessary to prevent harm, accidents, and failures. Traditionally, difficult scenarios have been categorized into corner cases and addressed individually. However, this example-based categorization is not scalable and lacks a data coverage perspective, neglecting the generalization to training data of machine learning models. In our work, we propose a novel machine learning approach that takes the underlying data distribution into account. Based on our novel perspective, we present a framework for effective corner case recognition for perception on individual samples. In our evaluation, we show that our approach (i) unifies existing scenario-based corner case taxonomies under a distributional perspective, (ii) achieves strong performance on corner case detection tasks across standard benchmarks for which we extend established out-of-distribution detection benchmarks, and (iii) enables analysis of combined corner cases via a newly introduced fog-augmented Lost & Found dataset. These results provide a principled basis for corner case recognition, underlining our manual specification-free definition.
CVAug 27, 2025
Scalable Object Detection in the Car Interior With Vision Foundation ModelsBálint Mészáros, Ahmet Firintepe, Sebastian Schmidt et al.
AI tasks in the car interior like identifying and localizing externally introduced objects is crucial for response quality of personal assistants. However, computational resources of on-board systems remain highly constrained, restricting the deployment of such solutions directly within the vehicle. To address this limitation, we propose the novel Object Detection and Localization (ODAL) framework for interior scene understanding. Our approach leverages vision foundation models through a distributed architecture, splitting computational tasks between on-board and cloud. This design overcomes the resource constraints of running foundation models directly in the car. To benchmark model performance, we introduce ODALbench, a new metric for comprehensive assessment of detection and localization.Our analysis demonstrates the framework's potential to establish new standards in this domain. We compare the state-of-the-art GPT-4o vision foundation model with the lightweight LLaVA 1.5 7B model and explore how fine-tuning enhances the lightweight models performance. Remarkably, our fine-tuned ODAL-LLaVA model achieves an ODAL$_{score}$ of 89%, representing a 71% improvement over its baseline performance and outperforming GPT-4o by nearly 20%. Furthermore, the fine-tuned model maintains high detection accuracy while significantly reducing hallucinations, achieving an ODAL$_{SNR}$ three times higher than GPT-4o.
LGJun 10, 2025
Effective Data Pruning through Score ExtrapolationSebastian Schmidt, Prasanga Dhungel, Christoffer Löffler et al.
Training advanced machine learning models demands massive datasets, resulting in prohibitive computational costs. To address this challenge, data pruning techniques identify and remove redundant training samples while preserving model performance. Yet, existing pruning techniques predominantly require a full initial training pass to identify removable samples, negating any efficiency benefits for single training runs. To overcome this limitation, we introduce a novel importance score extrapolation framework that requires training on only a small subset of data. We present two initial approaches in this framework - k-nearest neighbors and graph neural networks - to accurately predict sample importance for the entire dataset using patterns learned from this minimal subset. We demonstrate the effectiveness of our approach for 2 state-of-the-art pruning methods (Dynamic Uncertainty and TDDS), 4 different datasets (CIFAR-10, CIFAR-100, Places-365, and ImageNet), and 3 training paradigms (supervised, unsupervised, and adversarial). Our results indicate that score extrapolation is a promising direction to scale expensive score calculation methods, such as pruning, data attribution, or other tasks.
CLSep 7, 2021
FH-SWF SG at GermEval 2021: Using Transformer-Based Language Models to Identify Toxic, Engaging, & Fact-Claiming CommentsChristian Gawron, Sebastian Schmidt
In this paper we describe the methods we used for our submissions to the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. For all three subtasks we fine-tuned freely available transformer-based models from the Huggingface model hub. We evaluated the performance of various pre-trained models after fine-tuning on 80% of the training data with different hyperparameters and submitted predictions of the two best performing resulting models. We found that this approach worked best for subtask 3, for which we achieved an F1-score of 0.736.