SYMar 31
From Big Data to Fast Data: Towards High-Quality Datasets for Machine Learning Applications from Closed-Loop Data CollectionPhilipp Reis, Jacqueline Henle, Stefan Otten et al.
The increasing capabilities of machine learning models, such as vision-language and multimodal language models, are placing growing demands on data in automotive systems engineering, making the quality and relevance of collected data enablers for the development and validation of such systems. Traditional Big Data approaches focus on large-scale data collection and offline processing, while Smart Data approaches improve data selection strategies but still rely on centralized and offline post-processing. This paper introduces the concept of Fast Data for automotive systems engineering. The approach shifts data selection and recording onto the vehicle as the data source. By enabling real-time, context-aware decisions on whether and which data should be recorded, data collection can be directly aligned with data quality objectives and collection strategies within a closed-loop. This results in datasets with higher relevance, improved coverage of critical scenarios, and increased information density, while at the same time reducing irrelevant data and associated costs. The proposed approach provides a structured foundation for designing data collection strategies that are aligned with the needs of modern machine learning algorithms. It supports efficient data acquisition and contributes to scalable and cost-effective ML development processes in automotive systems engineering.
DBMar 13
A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data CollectionPhilipp Reis, Philipp Rigoll, Martin Zehetner et al.
Data-driven systems depend on task-relevant data, yet data collection pipelines remain passive and indiscriminate. Continuous logging of multimodal sensor streams incurs high storage costs and captures irrelevant data. This paper proposes a declarative framework for intent-driven, on-device data collection that enables selective collection of multimodal sensor data based on high-level user requests. The framework combines natural language interaction with a formally specified domain-specific language (DSL). Large language models translate user-defined requirements into verifiable and composable DSL programs that define conditional triggers across heterogeneous sensors, including cameras, LiDAR, and system telemetry. Empirical evaluation on vehicular and robotic perception tasks shows that the DSL-based approach achieves higher generation consistency and lower execution latency than unconstrained code generation while maintaining comparable detection performance. The structured abstraction supports modular trigger composition and concurrent deployment on resource-constrained edge platforms. This approach replaces passive logging with a verifiable, intent-driven mechanism for multimodal data collection in real-time systems.
SEFeb 8, 2021
SceML - A Graphical Modeling Framework for Scenario-based Testing of Autonomous VehiclesBarbara Schuett, Thilo Braun, Stefan Otten et al.
Ensuring the functional correctness and safety of autonomous vehicles is a major challenge for the automotive industry. However, exhaustive physical test drives are not feasible, as billions of driven kilometers would be required to obtain reliable results. Scenariobased testing is an approach to tackle this problem and reduce necessary test drives by replacing driven kilometers with simulations of relevant or interesting scenarios. These scenarios can be generated or extracted from recorded data with machine learning algorithms or created by experts. In this paper, we propose a novel graphical scenario modeling language. The graphical framework allows experts to create new scenarios or review ones designed by other experts or generated by machine learning algorithms. The scenario description is modeled as a graph and based on behavior trees. It supports different abstraction levels of scenario description during software and test development. Additionally, the graphbased structure provides modularity and reusable sub-scenarios, an important use case in scenario modeling. A graphical visualization of the scenario enhances comprehensibility for different users. The presented approach eases the scenario creation process and increases the usage of scenarios within development and testing processes.