LGNov 5, 2025
A Feedback-Control Framework for Efficient Dataset Collection from In-Vehicle Data StreamsPhilipp Reis, Philipp Rigoll, Christian Steinhauser et al.
Modern AI systems are increasingly constrained not by model capacity but by the quality and diversity of their data. Despite growing emphasis on data-centric AI, most datasets are still gathered in an open-loop manner which accumulates redundant samples without feedback from the current coverage. This results in inefficient storage, costly labeling, and limited generalization. To address this, this paper introduces Feedback Control Data Collection (FCDC), a paradigm that formulates data collection as a closed-loop control problem. FCDC continuously approximates the state of the collected data distribution using an online probabilistic model and adaptively regulates sample retention using based on feedback signals such as likelihood and Mahalanobis distance. Through this feedback mechanism, the system dynamically balances exploration and exploitation, maintains dataset diversity, and prevents redundancy from accumulating over time. In addition to demonstrating the controllability of FCDC on a synthetic dataset that converges toward a uniform distribution under Gaussian input assumption, experiments on real data streams show that FCDC produces more balanced datasets by 25.9% while reducing data storage by 39.8%. These results demonstrate that data collection itself can be actively controlled, transforming collection from a passive pipeline stage into a self-regulating, feedback-driven process at the core of data-centric AI.
CVMar 28, 2025
Data Quality Matters: Quantifying Image Quality Impact on Machine Learning PerformanceChristian Steinhauser, Philipp Reis, Hubert Padusinski et al.
Precise perception of the environment is essential in highly automated driving systems, which rely on machine learning tasks such as object detection and segmentation. Compression of sensor data is commonly used for data handling, while virtualization is used for hardware-in-the-loop validation. Both methods can alter sensor data and degrade model performance. This necessitates a systematic approach to quantifying image validity. This paper presents a four-step framework to evaluate the impact of image modifications on machine learning tasks. First, a dataset with modified images is prepared to ensure one-to-one matching image pairs, enabling measurement of deviations resulting from compression and virtualization. Second, image deviations are quantified by comparing the effects of compression and virtualization against original camera-based sensor data. Third, the performance of state-of-the-art object detection models is analyzed to determine how altered input data affects perception tasks, including bounding box accuracy and reliability. Finally, a correlation analysis is performed to identify relationships between image quality and model performance. As a result, the LPIPS metric achieves the highest correlation between image deviation and machine learning performance across all evaluated machine learning tasks.
ROMay 5, 2025
Point Cloud Recombination: Systematic Real Data Augmentation Using Robotic Targets for LiDAR Perception ValidationHubert Padusinski, Christian Steinhauser, Christian Scherl et al.
The validation of LiDAR-based perception of intelligent mobile systems operating in open-world applications remains a challenge due to the variability of real environmental conditions. Virtual simulations allow the generation of arbitrary scenes under controlled conditions but lack physical sensor characteristics, such as intensity responses or material-dependent effects. In contrast, real-world data offers true sensor realism but provides less control over influencing factors, hindering sufficient validation. Existing approaches address this problem with augmentation of real-world point cloud data by transferring objects between scenes. However, these methods do not consider validation and remain limited in controllability because they rely on empirical data. We solve these limitations by proposing Point Cloud Recombination, which systematically augments captured point cloud scenes by integrating point clouds acquired from physical target objects measured in controlled laboratory environments. Thus enabling the creation of vast amounts and varieties of repeatable, physically accurate test scenes with respect to phenomena-aware occlusions with registered 3D meshes. Using the Ouster OS1-128 Rev7 sensor, we demonstrate the augmentation of real-world urban and rural scenes with humanoid targets featuring varied clothing and poses, for repeatable positioning. We show that the recombined scenes closely match real sensor outputs, enabling targeted testing, scalable failure analysis, and improved system safety. By providing controlled yet sensor-realistic data, our method enables trustworthy conclusions about the limitations of specific sensors in compound with their algorithms, e.g., object detection.
ROJan 26, 2024
The Machine Vision Iceberg Explained: Advancing Dynamic Testing by Considering Holistic Environmental RelationsHubert Padusinski, Christian Steinhauser, Thilo Braun et al.
Machine Vision (MV) is essential for solving driving automation. This paper examines potential shortcomings in current MV testing strategies for highly automated driving (HAD) systems. We argue for a more comprehensive understanding of the performance factors that must be considered during the MV evaluation process, noting that neglecting these factors can lead to significant risks. This is not only relevant to MV component testing, but also to integration testing. To illustrate this point, we draw an analogy to a ship navigating towards an iceberg to show potential hidden challenges in current MV testing strategies. The main contribution is a novel framework for black-box testing which observes environmental relations. This means it is designed to enhance MV assessments by considering the attributes and surroundings of relevant individual objects. The framework provides the identification of seven general concerns about the object recognition of MV, which are not addressed adequately in established test processes. To detect these deficits based on their performance factors, we propose the use of a taxonomy called "granularity orders" along with a graphical representation. This allows an identification of MV uncertainties across a range of driving scenarios. This approach aims to advance the precision, efficiency, and completeness of testing procedures for MV.