Eric Sax

CV
h-index11
16papers
102citations
Novelty44%
AI Score49

16 Papers

SEMay 17, 2022
An Application of Scenario Exploration to Find New Scenarios for the Development and Testing of Automated Driving Systems in Urban Scenarios

Barbara Schütt, Marc Heinrich, Sonja Marahrens et al.

Verification and validation are major challenges for developing automated driving systems. A concept that gets more and more recognized for testing in automated driving is scenario-based testing. However, it introduces the problem of what scenarios are relevant for testing and which are not. This work aims to find relevant, interesting, or critical parameter sets within logical scenarios by utilizing Bayes optimization and Gaussian processes. The parameter optimization is done by comparing and evaluating six different metrics in two urban intersection scenarios. Finally, a list of ideas this work leads to and should be investigated further is presented.

SYMar 31
From Big Data to Fast Data: Towards High-Quality Datasets for Machine Learning Applications from Closed-Loop Data Collection

Philipp Reis, Jacqueline Henle, Stefan Otten et al.

The increasing capabilities of machine learning models, such as vision-language and multimodal language models, are placing growing demands on data in automotive systems engineering, making the quality and relevance of collected data enablers for the development and validation of such systems. Traditional Big Data approaches focus on large-scale data collection and offline processing, while Smart Data approaches improve data selection strategies but still rely on centralized and offline post-processing. This paper introduces the concept of Fast Data for automotive systems engineering. The approach shifts data selection and recording onto the vehicle as the data source. By enabling real-time, context-aware decisions on whether and which data should be recorded, data collection can be directly aligned with data quality objectives and collection strategies within a closed-loop. This results in datasets with higher relevance, improved coverage of critical scenarios, and increased information density, while at the same time reducing irrelevant data and associated costs. The proposed approach provides a structured foundation for designing data collection strategies that are aligned with the needs of modern machine learning algorithms. It supports efficient data acquisition and contributes to scalable and cost-effective ML development processes in automotive systems engineering.

DBMar 13
A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection

Philipp Reis, Philipp Rigoll, Martin Zehetner et al.

Data-driven systems depend on task-relevant data, yet data collection pipelines remain passive and indiscriminate. Continuous logging of multimodal sensor streams incurs high storage costs and captures irrelevant data. This paper proposes a declarative framework for intent-driven, on-device data collection that enables selective collection of multimodal sensor data based on high-level user requests. The framework combines natural language interaction with a formally specified domain-specific language (DSL). Large language models translate user-defined requirements into verifiable and composable DSL programs that define conditional triggers across heterogeneous sensors, including cameras, LiDAR, and system telemetry. Empirical evaluation on vehicular and robotic perception tasks shows that the DSL-based approach achieves higher generation consistency and lower execution latency than unconstrained code generation while maintaining comparable detection performance. The structured abstraction supports modular trigger composition and concurrent deployment on resource-constrained edge platforms. This approach replaces passive logging with a verifiable, intent-driven mechanism for multimodal data collection in real-time systems.

LGNov 5, 2025
A Feedback-Control Framework for Efficient Dataset Collection from In-Vehicle Data Streams

Philipp Reis, Philipp Rigoll, Christian Steinhauser et al.

Modern AI systems are increasingly constrained not by model capacity but by the quality and diversity of their data. Despite growing emphasis on data-centric AI, most datasets are still gathered in an open-loop manner which accumulates redundant samples without feedback from the current coverage. This results in inefficient storage, costly labeling, and limited generalization. To address this, this paper introduces Feedback Control Data Collection (FCDC), a paradigm that formulates data collection as a closed-loop control problem. FCDC continuously approximates the state of the collected data distribution using an online probabilistic model and adaptively regulates sample retention using based on feedback signals such as likelihood and Mahalanobis distance. Through this feedback mechanism, the system dynamically balances exploration and exploitation, maintains dataset diversity, and prevents redundancy from accumulating over time. In addition to demonstrating the controllability of FCDC on a synthetic dataset that converges toward a uniform distribution under Gaussian input assumption, experiments on real data streams show that FCDC produces more balanced datasets by 25.9% while reducing data storage by 39.8%. These results demonstrate that data collection itself can be actively controlled, transforming collection from a passive pipeline stage into a self-regulating, feedback-driven process at the core of data-centric AI.

CVDec 17, 2025
R4: Retrieval-Augmented Reasoning for Vision-Language Models in 4D Spatio-Temporal Space

Tin Stribor Sohn, Maximilian Dillitzer, Jason J. Corso et al.

Humans perceive and reason about their surroundings in four dimensions by building persistent, structured internal representations that encode semantic meaning, spatial layout, and temporal dynamics. These multimodal memories enable them to recall past events, infer unobserved states, and integrate new information into context-dependent reasoning. Inspired by this capability, we introduce R4, a training-free framework for retrieval-augmented reasoning in 4D spatio-temporal space that equips vision-language models (VLMs) with structured, lifelong memory. R4 continuously constructs a 4D knowledge database by anchoring object-level semantic descriptions in metric space and time, yielding a persistent world model that can be shared across agents. At inference, natural language queries are decomposed into semantic, spatial, and temporal keys to retrieve relevant observations, which are integrated into the VLM's reasoning. Unlike classical retrieval-augmented generation methods, retrieval in R4 operates directly in 4D space, enabling episodic and collaborative reasoning without training. Experiments on embodied question answering and navigation benchmarks demonstrate that R4 substantially improves retrieval and reasoning over spatio-temporal information compared to baselines, advancing a new paradigm for embodied 4D reasoning in dynamic environments.

RODec 19, 2025
Embodied4C: Measuring What Matters for Embodied Vision-Language Navigation

Tin Stribor Sohn, Maximilian Dillitzer, Jason J. Corso et al.

Vision-language navigation requires agents to reason and act under constraints of embodiment. While vision-language models (VLMs) demonstrate strong generalization, current benchmarks provide limited understanding of how embodiment -- i.e., the choice of physical platform, sensor configuration, and modality alignment -- influences perception, reasoning, and control. We introduce Embodied4C, a closed-loop benchmark designed as a Turing test for embodied reasoning. The benchmark evaluates the core embodied capabilities of VLMs across three heterogeneous embodiments -- autonomous vehicles, aerial drones, and robotic manipulators -- through approximately 1.1K one-shot reasoning questions and 58 goal-directed navigation tasks. These tasks jointly assess four foundational dimensions: semantic, spatial, temporal, and physical reasoning. Each embodiment presents dynamic sensor configurations and environment variations to probe generalization beyond platform-specific adaptation. To prevent embodiment overfitting, Embodied4C integrates domain-far queries targeting abstract and cross-context reasoning. Comprehensive evaluation across ten state-of-the-art VLMs and four embodied control baselines shows that cross-modal alignment and instruction tuning matter more than scale, while spatial and temporal reasoning remains the primary bottleneck for reliable embodied competence.

CVDec 18, 2025
SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning

Tin Stribor Sohn, Maximilian Dillitzer, Jason J. Corso et al.

Autonomous robotic systems require spatio-temporal understanding of dynamic environments to ensure reliable navigation and interaction. While Vision-Language Models (VLMs) provide open-world semantic priors, they lack grounding in 3D geometry and temporal dynamics. Conversely, geometric perception captures structure and motion but remains semantically sparse. We propose SNOW (Scene Understanding with Open-World Knowledge), a training-free and backbone-agnostic framework for unified 4D scene understanding that integrates VLM-derived semantics with point cloud geometry and temporal consistency. SNOW processes synchronized RGB images and 3D point clouds, using HDBSCAN clustering to generate object-level proposals that guide SAM2-based segmentation. Each segmented region is encoded through our proposed Spatio-Temporal Tokenized Patch Encoding (STEP), producing multimodal tokens that capture localized semantic, geometric, and temporal attributes. These tokens are incrementally integrated into a 4D Scene Graph (4DSG), which serves as 4D prior for downstream reasoning. A lightweight SLAM backend anchors all STEP tokens spatially in the environment, providing the global reference alignment, and ensuring unambiguous spatial grounding across time. The resulting 4DSG forms a queryable, unified world model through which VLMs can directly interpret spatial scene structure and temporal dynamics. Experiments on a diverse set of benchmarks demonstrate that SNOW enables precise 4D scene understanding and spatially grounded inference, thereby setting new state-of-the-art performance in several settings, highlighting the importance of structured 4D priors for embodied reasoning and autonomous robotics.

RODec 4, 2023
Unveiling Objects with SOLA: An Annotation-Free Image Search on the Object Level for Automotive Data Sets

Philipp Rigoll, Jacob Langner, Eric Sax

Huge image data sets are the fundament for the development of the perception of automated driving systems. A large number of images is necessary to train robust neural networks that can cope with diverse situations. A sufficiently large data set contains challenging situations and objects. For testing the resulting functions, it is necessary that these situations and objects can be found and extracted from the data set. While it is relatively easy to record a large amount of unlabeled data, it is far more difficult to find demanding situations and objects. However, during the development of perception systems, it must be possible to access challenging data without having to perform lengthy and time-consuming annotations. A developer must therefore be able to search dynamically for specific situations and objects in a data set. Thus, we designed a method which is based on state-of-the-art neural networks to search for objects with certain properties within an image. For the ease of use, the query of this search is described using natural language. To determine the time savings and performance gains, we evaluated our method qualitatively and quantitatively on automotive data sets.

CVMar 28, 2025
Data Quality Matters: Quantifying Image Quality Impact on Machine Learning Performance

Christian Steinhauser, Philipp Reis, Hubert Padusinski et al.

Precise perception of the environment is essential in highly automated driving systems, which rely on machine learning tasks such as object detection and segmentation. Compression of sensor data is commonly used for data handling, while virtualization is used for hardware-in-the-loop validation. Both methods can alter sensor data and degrade model performance. This necessitates a systematic approach to quantifying image validity. This paper presents a four-step framework to evaluate the impact of image modifications on machine learning tasks. First, a dataset with modified images is prepared to ensure one-to-one matching image pairs, enabling measurement of deviations resulting from compression and virtualization. Second, image deviations are quantified by comparing the effects of compression and virtualization against original camera-based sensor data. Third, the performance of state-of-the-art object detection models is analyzed to determine how altered input data affects perception tasks, including bounding box accuracy and reliability. Finally, a correlation analysis is performed to identify relationships between image quality and model performance. As a result, the LPIPS metric achieves the highest correlation between image deviation and machine learning performance across all evaluated machine learning tasks.

RODec 21, 2023
EfficientPPS: Part-aware Panoptic Segmentation of Transparent Objects for Robotic Manipulation

Benjamin Alt, Minh Dang Nguyen, Andreas Hermann et al.

The use of autonomous robots for assistance tasks in hospitals has the potential to free up qualified staff and im-prove patient care. However, the ubiquity of deformable and transparent objects in hospital settings poses signif-icant challenges to vision-based perception systems. We present EfficientPPS, a neural architecture for part-aware panoptic segmentation that provides robots with semantically rich visual information for grasping and ma-nipulation tasks. We also present an unsupervised data collection and labelling method to reduce the need for human involvement in the training process. EfficientPPS is evaluated on a dataset containing real-world hospital objects and demonstrated to be robust and efficient in grasping transparent transfusion bags with a collaborative robot arm.

CVJul 6, 2025
A Data-Driven Novelty Score for Diverse In-Vehicle Data Recording

Philipp Reis, Joshua Ransiek, David Petri et al.

High-quality datasets are essential for training robust perception systems in autonomous driving. However, real-world data collection is often biased toward common scenes and objects, leaving novel cases underrepresented. This imbalance hinders model generalization and compromises safety. The core issue is the curse of rarity. Over time, novel events occur infrequently, and standard logging methods fail to capture them effectively. As a result, large volumes of redundant data are stored, while critical novel cases are diluted, leading to biased datasets. This work presents a real-time data selection method focused on object-level novelty detection to build more balanced and diverse datasets. The method assigns a data-driven novelty score to image frames using a novel dynamic Mean Shift algorithm. It models normal content based on mean and covariance statistics to identify frames with novel objects, discarding those with redundant elements. The main findings show that reducing the training dataset size with this method can improve model performance, whereas higher redundancy tends to degrade it. Moreover, as data redundancy increases, more aggressive filtering becomes both possible and beneficial. While random sampling can offer some gains, it often leads to overfitting and unpredictability in outcomes. The proposed method supports real-time deployment with 32 frames per second and is constant over time. By continuously updating the definition of normal content, it enables efficient detection of novelties in a continuous data stream.

CVMar 14, 2025
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving

Tin Stribor Sohn, Philipp Reis, Maximilian Dillitzer et al.

Multimodal large language models (MLLMs) hold the potential to enhance autonomous driving by combining domain-independent world knowledge with context-specific language guidance. Their integration into autonomous driving systems shows promising results in isolated proof-of-concept applications, while their performance is evaluated on selective singular aspects of perception, reasoning, or planning. To leverage their full potential a systematic framework for evaluating MLLMs in the context of autonomous driving is required. This paper proposes a holistic framework for a capability-driven evaluation of MLLMs in autonomous driving. The framework structures scenario understanding along the four core capability dimensions semantic, spatial, temporal, and physical. They are derived from the general requirements of autonomous driving systems, human driver cognition, and language-based reasoning. It further organises the domain into context layers, processing modalities, and downstream tasks such as language-based interaction and decision-making. To illustrate the framework's applicability, two exemplary traffic scenarios are analysed, grounding the proposed dimensions in realistic driving situations. The framework provides a foundation for the structured evaluation of MLLMs' potential for scenario understanding in autonomous driving.

CVApr 8, 2024
CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing

Philipp Rigoll, Laurenz Adolph, Lennart Ries et al.

Perception systems, especially cameras, are the eyes of automated driving systems. Ensuring that they function reliably and robustly is therefore an important building block in the automation of vehicles. There are various approaches to test the perception of automated driving systems. Ultimately, however, it always comes down to the investigation of the behavior of perception systems under specific input data. Camera images are a crucial part of the input data. Image data sets are therefore collected for the testing of automated driving systems, but it is non-trivial to find specific images in these data sets. Thanks to recent developments in neural networks, there are now methods for sorting the images in a data set according to their similarity to a prompt in natural language. In order to further automate the provision of search results, we make a contribution by automating the threshold definition in these sorted results and returning only the images relevant to the prompt as a result. Our focus is on preventing false positives and false negatives equally. It is also important that our method is robust and in the case that our assumptions are not fulfilled, we provide a fallback solution.

ROJan 26, 2024
The Machine Vision Iceberg Explained: Advancing Dynamic Testing by Considering Holistic Environmental Relations

Hubert Padusinski, Christian Steinhauser, Thilo Braun et al.

Machine Vision (MV) is essential for solving driving automation. This paper examines potential shortcomings in current MV testing strategies for highly automated driving (HAD) systems. We argue for a more comprehensive understanding of the performance factors that must be considered during the MV evaluation process, noting that neglecting these factors can lead to significant risks. This is not only relevant to MV component testing, but also to integration testing. To illustrate this point, we draw an analogy to a ship navigating towards an iceberg to show potential hidden challenges in current MV testing strategies. The main contribution is a novel framework for black-box testing which observes environmental relations. This means it is designed to enhance MV assessments by considering the attributes and surroundings of relevant individual objects. The framework provides the identification of seven general concerns about the object recognition of MV, which are not addressed adequately in established test processes. To detect these deficits based on their performance factors, we propose the use of a taxonomy called "granularity orders" along with a graphical representation. This allows an identification of MV uncertainties across a range of driving scenarios. This approach aims to advance the precision, efficiency, and completeness of testing procedures for MV.

SEFeb 12, 2021
A taxonomy for quality in simulation-based development and testing of automated driving systems

Barbara Schütt, Markus Steimle, Birte Kramer et al.

Ensuring the quality of automated driving systems is a major challenge the automotive industry is facing. In this context, quality defines the degree to which an object meets expectations and requirements. Especially, automated vehicles at SAE level 4 and 5 will be expected to operate safely in various contexts and complex situations without misconduct. Thus, a systematic approach is needed to show their safe operation. A way to address this challenge is simulation-based testing as pure physical testing is not feasible. During simulation-based testing, the data used to evaluate the actual quality of an automated driving system are generated using a simulation. However, to rely on these simulation data, the overall simulation, which also includes its simulation models, must provide a certain quality level. This quality level depends on the intended purpose for which the generated simulation data should be used. Therefore, three categories of quality can be considered: quality of the automated driving system and simulation quality, consisting of simulation model quality and scenario quality. Hence, quality must be determined and evaluated in various process steps in developing and testing automated driving systems, the overall simulation, and the simulation models used for the simulation. In this paper, we propose a taxonomy to serve a better understanding of the concept of quality in the development and testing process to have a clear separation and insight where further testing is needed -- both in terms of automated driving systems and simulation, including their simulation models and scenarios used for testing.

SEFeb 8, 2021
SceML - A Graphical Modeling Framework for Scenario-based Testing of Autonomous Vehicles

Barbara Schuett, Thilo Braun, Stefan Otten et al.

Ensuring the functional correctness and safety of autonomous vehicles is a major challenge for the automotive industry. However, exhaustive physical test drives are not feasible, as billions of driven kilometers would be required to obtain reliable results. Scenariobased testing is an approach to tackle this problem and reduce necessary test drives by replacing driven kilometers with simulations of relevant or interesting scenarios. These scenarios can be generated or extracted from recorded data with machine learning algorithms or created by experts. In this paper, we propose a novel graphical scenario modeling language. The graphical framework allows experts to create new scenarios or review ones designed by other experts or generated by machine learning algorithms. The scenario description is modeled as a graph and based on behavior trees. It supports different abstraction levels of scenario description during software and test development. Additionally, the graphbased structure provides modularity and reusable sub-scenarios, an important use case in scenario modeling. A graphical visualization of the scenario enhances comprehensibility for different users. The presented approach eases the scenario creation process and increases the usage of scenarios within development and testing processes.