CVFeb 17, 2023
LDFA: Latent Diffusion Face Anonymization for Self-driving ApplicationsMarvin Klemp, Kevin Rösch, Royden Wagner et al.
In order to protect vulnerable road users (VRUs), such as pedestrians or cyclists, it is essential that intelligent transportation systems (ITS) accurately identify them. Therefore, datasets used to train perception models of ITS must contain a significant number of vulnerable road users. However, data protection regulations require that individuals are anonymized in such datasets. In this work, we introduce a novel deep learning-based pipeline for face anonymization in the context of ITS. In contrast to related methods, we do not use generative adversarial networks (GANs) but build upon recent advances in diffusion models. We propose a two-stage method, which contains a face detection model followed by a latent diffusion model to generate realistic face in-paintings. To demonstrate the versatility of anonymized images, we train segmentation methods on anonymized data and evaluate them on non-anonymized data. Our experiment reveal that our pipeline is better suited to anonymize data for segmentation than naive methods and performes comparably with recent GAN-based methods. Moreover, face detectors achieve higher mAP scores for faces anonymized by our method compared to naive or recent GAN-based methods.
CVApr 24
Railway Artificial Intelligence Learning Benchmark (RAIL-BENCH): A Benchmark Suite for Perception in the Railway DomainAnnika Bätz, Pavel Klasek, Seo-Young Ham et al.
Automated train operation on existing railway infrastructure requires robust camera-based perception, yet the railway domain lacks public benchmark suites with standardized evaluation protocols that would enable reproducible comparison of approaches. We present RAIL-BENCH, the first perception benchmark suite for the railway domain. It comprises five challenges - rail track detection, object detection, vegetation segmentation, multi-object tracking, and monocular visual odometry - each tailored to the specific characteristics of railway environments. RAIL-BENCH provides curated training and test datasets drawn from diverse real-world scenarios, evaluation metrics, and public scoreboards (https://www.mrt.kit.edu/railbench). For the rail track detection challenge we introduce LineAP, a novel segment-based average precision metric that evaluates the geometric accuracy of polyline predictions independently of instance-level grouping, addressing key limitations of existing line detection metrics.
ROMar 13Code
Better Safe Than Sorry: Enhancing Arbitration Graphs for Safe and Robust Autonomous Decision-MakingPiotr Spieker, Nick Le Large, Martin Lauer
This paper introduces an extension to the arbitration graph framework designed to enhance the safety and robustness of autonomous systems in complex, dynamic environments. Building on the flexibility and scalability of arbitration graphs, the proposed method incorporates a verification step and structured fallback layers in the decision-making process. This ensures that only verified and safe commands are executed while enabling graceful degradation in the presence of unexpected faults or bugs. The approach is demonstrated using a Pac-Man simulation and further validated in the context of autonomous driving, where it shows significant reductions in accident risk and improvements in overall system safety. The bottom-up design of arbitration graphs allows for an incremental integration of new behavior components. The extension presented in this work enables the integration of experimental or immature behavior components while maintaining system safety by clearly and precisely defining the conditions under which behaviors are considered safe. The proposed method is implemented as a ready to use header-only C++ library, published under the MIT License. Together with the Pac-Man demo, it is available at github.com/KIT-MRT/arbitration_graphs.
CVMar 24
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail DatasetRoyden Wagner, Omer Sahin Tas, Jaime Villa et al.
In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions, and detailed reasoning traces, facilitating in-context learning and few-shot generalization. The resulting benchmark for multimodal models, such as VLMs and VLAs, goes beyond safety and comfort metrics by evaluating instruction following and semantic coherence between model outputs. The multilingual reasoning traces in English, Spanish, and Chinese are from domain experts with diverse cultural backgrounds. Thus, our dataset is a unique resource for studying how different forms of reasoning affect driving competence. Our dataset is available at: https://hf.co/datasets/kit-mrt/kitscenes-longtail
CVAug 18, 2024
Adversarial Attacked Teacher for Unsupervised Domain Adaptive Object DetectionKaiwen Wang, Yinzhe Shen, Martin Lauer
Object detectors encounter challenges in handling domain shifts. Cutting-edge domain adaptive object detection methods use the teacher-student framework and domain adversarial learning to generate domain-invariant pseudo-labels for self-training. However, the pseudo-labels generated by the teacher model tend to be biased towards the majority class and often mistakenly include overconfident false positives and underconfident false negatives. We reveal that pseudo-labels vulnerable to adversarial attacks are more likely to be low-quality. To address this, we propose a simple yet effective framework named Adversarial Attacked Teacher (AAT) to improve the quality of pseudo-labels. Specifically, we apply adversarial attacks to the teacher model, prompting it to generate adversarial pseudo-labels to correct bias, suppress overconfidence, and encourage underconfident proposals. An adaptive pseudo-label regularization is introduced to emphasize the influence of pseudo-labels with high certainty and reduce the negative impacts of uncertain predictions. Moreover, robust minority objects verified by pseudo-label regularization are oversampled to minimize dataset imbalance without introducing false positives. Extensive experiments conducted on various datasets demonstrate that AAT achieves superior performance, reaching 52.6 mAP on Clipart1k, surpassing the previous state-of-the-art by 6.7%.
ROMay 2, 2024
An Approach to Systematic Data Acquisition and Data-Driven Simulation for the Safety Testing of Automated Driving FunctionsLeon Eisemann, Mirjam Fehling-Kaschek, Henrik Gommel et al.
With growing complexity and criticality of automated driving functions in road traffic and their operational design domains (ODD), there is increasing demand for covering significant proportions of development, validation, and verification in virtual environments and through simulation models. If, however, simulations are meant not only to augment real-world experiments, but to replace them, quantitative approaches are required that measure to what degree and under which preconditions simulation models adequately represent reality, and thus, using their results accordingly. Especially in R&D areas related to the safety impact of the "open world", there is a significant shortage of real-world data to parameterize and/or validate simulations - especially with respect to the behavior of human traffic participants, whom automated driving functions will meet in mixed traffic. We present an approach to systematically acquire data in public traffic by heterogeneous means, transform it into a unified representation, and use it to automatically parameterize traffic behavior models for use in data-driven virtual validation of automated driving functions.
LGMar 18, 2024
PITA: Physics-Informed Trajectory AutoencoderJohannes Fischer, Kevin Rösch, Martin Lauer et al.
Validating robotic systems in safety-critical appli-cations requires testing in many scenarios including rare edgecases that are unlikely to occur, requiring to complement real-world testing with testing in simulation. Generative models canbe used to augment real-world datasets with generated data toproduce edge case scenarios by sampling in a learned latentspace. Autoencoders can learn said latent representation for aspecific domain by learning to reconstruct the input data froma lower-dimensional intermediate representation. However, theresulting trajectories are not necessarily physically plausible, butinstead typically contain noise that is not present in the inputtrajectory. To resolve this issue, we propose the novel Physics-Informed Trajectory Autoencoder (PITA) architecture, whichincorporates a physical dynamics model into the loss functionof the autoencoder. This results in smooth trajectories that notonly reconstruct the input trajectory but also adhere to thephysical model. We evaluate PITA on a real-world dataset ofvehicle trajectories and compare its performance to a normalautoencoder and a state-of-the-art action-space autoencoder.
CVMar 23, 2024
Adversarial Defense Teacher for Cross-Domain Object Detection under Poor Visibility ConditionsKaiwen Wang, Yinzhe Shen, Martin Lauer
Existing object detectors encounter challenges in handling domain shifts between training and real-world data, particularly under poor visibility conditions like fog and night. Cutting-edge cross-domain object detection methods use teacher-student frameworks and compel teacher and student models to produce consistent predictions under weak and strong augmentations, respectively. In this paper, we reveal that manually crafted augmentations are insufficient for optimal teaching and present a simple yet effective framework named Adversarial Defense Teacher (ADT), leveraging adversarial defense to enhance teaching quality. Specifically, we employ adversarial attacks, encouraging the model to generalize on subtly perturbed inputs that effectively deceive the model. To address small objects under poor visibility conditions, we propose a Zoom-in Zoom-out strategy, which zooms-in images for better pseudo-labels and zooms-out images and pseudo-labels to learn refined features. Our results demonstrate that ADT achieves superior performance, reaching 54.5% mAP on Foggy Cityscapes, surpassing the previous state-of-the-art by 2.6% mAP.
ROJul 15, 2021
High-level Decisions from a Safe Maneuver Catalog with Reinforcement Learning for Safe and Cooperative Automated MergingDanial Kamran, Yu Ren, Martin Lauer
Reinforcement learning (RL) has recently been used for solving challenging decision-making problems in the context of automated driving. However, one of the main drawbacks of the presented RL-based policies is the lack of safety guarantees, since they strive to reduce the expected number of collisions but still tolerate them. In this paper, we propose an efficient RL-based decision-making pipeline for safe and cooperative automated driving in merging scenarios. The RL agent is able to predict the current situation and provide high-level decisions, specifying the operation mode of the low level planner which is responsible for safety. In order to learn a more generic policy, we propose a scalable RL architecture for the merging scenario that is not sensitive to changes in the environment configurations. According to our experiments, the proposed RL agent can efficiently identify cooperative drivers from their vehicle state history and generate interactive maneuvers, resulting in faster and more comfortable automated driving. At the same time, thanks to the safety constraints inside the planner, all of the maneuvers are collision free and safe.
ROJun 8, 2021
Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning in Continuous SpacesÖmer Şahin Taş, Felix Hauser, Martin Lauer
Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP). Finding exact solutions of POMDPs is generally computationally intractable, but the solution can be approximated by sampling-based approaches. These sampling-based POMDP solvers rely on multi-armed bandit (MAB) heuristics, which assume the outcomes of different actions to be uncorrelated. In some applications, like motion planning in continuous spaces, similar actions yield similar outcomes. In this paper, we utilize variants of MAB heuristics that make Lipschitz continuity assumptions on the outcomes of actions to improve the efficiency of sampling-based planning approaches. We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
ROJul 14, 2020
Fast Lane-Level Intersection Estimation using Markov Chain Monte Carlo Sampling and B-Spline RefinementAnnika Meyer, Jonas Walter, Martin Lauer
Estimating the current scene and understanding the potential maneuvers are essential capabilities of automated vehicles. Most approaches rely heavily on the correctness of maps, but neglect the possibility of outdated information. We present an approach that is able to estimate lanes without relying on any map prior. The estimation is based solely on the trajectories of other traffic participants and is thereby able to incorporate complex environments. In particular, we are able to estimate the scene in the presence of heavy traffic and occlusions. The algorithm first estimates a coarse lane-level intersection model by Markov chain Monte Carlo sampling and refines it later by aligning the lane course with the measurements using a non-linear least squares formulation. We model the lanes as 1D cubic B-splines and can achieve error rates of less than 10cm within real-time.
AIApr 9, 2020
Risk-Aware High-level Decisions for Automated Driving at Occluded Intersections with Reinforcement LearningDanial Kamran, Carlos Fernandez Lopez, Martin Lauer et al.
Reinforcement learning is nowadays a popular framework for solving different decision making problems in automated driving. However, there are still some remaining crucial challenges that need to be addressed for providing more reliable policies. In this paper, we propose a generic risk-aware DQN approach in order to learn high level actions for driving through unsignalized occluded intersections. The proposed state representation provides lane based information which allows to be used for multi-lane scenarios. Moreover, we propose a risk based reward function which punishes risky situations instead of only collision failures. Such rewarding approach helps to incorporate risk prediction into our deep Q network and learn more reliable policies which are safer in challenging situations. The efficiency of the proposed approach is compared with a DQN learned with conventional collision based rewarding scheme and also with a rule-based intersection navigation policy. Evaluation results show that the proposed approach outperforms both of these methods. It provides safer actions than collision-aware DQN approach and is less overcautious than the rule-based policy.
ROMar 2, 2020
Decision-Making for Automated Vehicles Using a Hierarchical Behavior-Based Arbitration SchemePiotr Franciszek Orzechowski, Christoph Burger, Martin Lauer
Behavior planning and decision-making are some of the biggest challenges for highly automated systems. A fully automated vehicle (AV) is confronted with numerous tactical and strategical choices. Most state-of-the-art AV platforms implement tactical and strategical behavior generation using finite state machines. However, these usually result in poor explainability, maintainability and scalability. Research in robotics has raised many architectures to mitigate these problems, most interestingly behavior-based systems and hybrid derivatives. Inspired by these approaches, we propose a hierarchical behavior-based architecture for tactical and strategical behavior generation in automated driving. It is a generalizing and scalable decision-making framework, utilizing modular behavior blocks to compose more complex behaviors in a bottom-up approach. The system is capable of combining a variety of scenario- and methodology-specific solutions, like POMDPs, RRT* or learning-based behavior, into one understandable and traceable architecture. We extend the hierarchical behavior-based arbitration concept to address scenarios where multiple behavior options are applicable but have no clear priority against each other. Then, we formulate the behavior generation stack for automated driving in urban and highway environments, incorporating parking and emergency behaviors as well. Finally, we illustrate our design in an explanatory evaluation.
CVJun 6, 2019
Anytime Lane-Level Intersection Estimation Based on Trajectories of Other Traffic ParticipantsAnnika Meyer, Jonas Walter, Martin Lauer et al.
Estimating and understanding the current scene is an inevitable capability of automated vehicles. Usually, maps are used as prior for interpreting sensor measurements in order to drive safely and comfortably. Only few approaches take into account that maps might be outdated and lead to wrong assumptions on the environment. This work estimates a lane-level intersection topology without any map prior by observing the trajectories of other traffic participants. We are able to deliver both a coarse lane-level topology as well as the lane course inside and outside of the intersection using Markov chain Monte Carlo sampling. The model is neither limited to a number of lanes or arms nor to the topology of the intersection. We present our results on an evaluation set of 1000 simulated intersections and achieve 99.9% accuracy on the topology estimation that takes only 36ms, when utilizing tracked object detections. The precise lane course on these intersections is estimated with an error of 15cm on average after 140ms. Our approach shows a similar level of precision on 14 real-world intersections with 18cm average deviation on simple intersections and 27cm for more complex scenarios. Here the estimation takes only 113ms in total.
CVJun 4, 2019
Localization in Aerial Imagery with Grid Maps using LocGANHaohao Hu, Junyi Zhu, Sascha Wirges et al.
In this work, we present LocGAN, our localization approach based on a geo-referenced aerial imagery and LiDAR grid maps. Currently, most self-localization approaches relate the current sensor observations to a map generated from previously acquired data. Unfortunately, this data is not always available and the generated maps are usually sensor setup specific. Global Navigation Satellite Systems (GNSS) can overcome this problem. However, they are not always reliable especially in urban areas due to multi-path and shadowing effects. Since aerial imagery is usually available, we can use it as prior information. To match aerial images with grid maps, we use conditional Generative Adversarial Networks (cGANs) which transform aerial images to the grid map domain. The transformation between the predicted and measured grid map is estimated using a localization network (LocNet). Given the geo-referenced aerial image transformation the vehicle pose can be estimated. Evaluations performed on the data recorded in region Karlsruhe, Germany show that our LocGAN approach provides reliable global localization results.
ROJan 31, 2019
Capturing Object Detection Uncertainty in Multi-Layer Grid MapsSascha Wirges, Marcel Reith-Braun, Martin Lauer et al.
We propose a deep convolutional object detector for automated driving applications that also estimates classification, pose and shape uncertainty of each detected object. The input consists of a multi-layer grid map which is well-suited for sensor fusion, free-space estimation and machine learning. Based on the estimated pose and shape uncertainty we approximate object hulls with bounded collision probability which we find helpful for subsequent trajectory planning tasks. We train our models based on the KITTI object detection data set. In a quantitative and qualitative evaluation some models show a similar performance and superior robustness compared to previously developed object detectors. However, our evaluation also points to undesired data set properties which should be addressed when training data-driven models or creating new data sets.
ROJul 19, 2018
LIMO: Lidar-Monocular Visual OdometryJohannes Graeter, Alexander Wilczynski, Martin Lauer
Higher level functionality in autonomous driving depends strongly on a precise motion estimate of the vehicle. Powerful algorithms have been developed. However, their great majority focuses on either binocular imagery or pure LIDAR measurements. The promising combination of camera and LIDAR for visual localization has mostly been unattended. In this work we fill this gap, by proposing a depth extraction algorithm from LIDAR measurements for camera feature tracks and estimating motion by robustified keyframe based Bundle Adjustment. Semantic labeling is used for outlier rejection and weighting of vegetation landmarks. The capability of this sensor combination is demonstrated on the competitive KITTI dataset, achieving a placement among the top 15. The code is released to the community.
ROJul 3, 2018
Tackling Occlusions & Limited Sensor Range with Set-based Safety VerificationPiotr Franciszek Orzechowski, Annika Meyer, Martin Lauer
Provable safety is one of the most critical challenges in automated driving. The behavior of numerous traffic participants in a scene cannot be predicted reliably due to complex interdependencies and the indiscriminate behavior of humans. Additionally, we face high uncertainties and only incomplete environment knowledge. Recent approaches minimize risk with probabilistic and machine learning methods - even under occlusions. These generate comfortable behavior with good traffic flow, but cannot guarantee safety of their maneuvers. Therefore, we contribute a safety verification method for trajectories under occlusions. The field-of-view of the ego vehicle and a map are used to identify critical sensing field edges, each representing a potentially hidden obstacle. The state of occluded obstacles is unknown, but can be over-approximated by intervals over all possible states. Then set-based methods are extended to provide occupancy predictions for obstacles with state intervals. The proposed method can verify the safety of given trajectories (e.g. if they ensure collision-free fail-safe maneuver options) w.r.t. arbitrary safe-state formulations. The potential for provably safe trajectory planning is shown in three evaluative scenarios.
ROMay 14, 2018
Generating Comfortable, Safe and Comprehensible Trajectories for Automated Vehicles in Mixed TrafficMaximilian Naumann, Martin Lauer, Christoph Stiller
While motion planning approaches for automated driving often focus on safety and mathematical optimality with respect to technical parameters, they barely consider convenience, perceived safety for the passenger and comprehensibility for other traffic participants. For automated driving in mixed traffic, however, this is key to reach public acceptance. In this paper, we revise the problem statement of motion planning in mixed traffic: Instead of largely simplifying the motion planning problem to a convex optimization problem, we keep a more complex probabilistic multi agent model and strive for a near optimal solution. We assume cooperation of other traffic participants, yet being aware of violations of this assumption. This approach yields solutions that are provably safe in all situations, and convenient and comprehensible in situations that are also unambiguous for humans. Thus, it outperforms existing approaches in mixed traffic scenarios, as we show in simulation.
CVFeb 23, 2018
An Approach to Vehicle Trajectory Prediction Using Automatically Generated Traffic MapsJannik Quehl, Haohao Hu, Sascha Wirges et al.
Trajectory and intention prediction of traffic participants is an important task in automated driving and crucial for safe interaction with the environment. In this paper, we present a new approach to vehicle trajectory prediction based on automatically generated maps containing statistical information about the behavior of traffic participants in a given area. These maps are generated based on trajectory observations using image processing and map matching techniques and contain all typical vehicle movements and probabilities in the considered area. Our prediction approach matches an observed trajectory to a behavior contained in the map and uses this information to generate a prediction. We evaluated our approach on a dataset containing over 14000 trajectories and found that it produces significantly more precise mid-term predictions compared to motion model-based prediction approaches.
CVAug 1, 2017
Momo: Monocular Motion Estimation on ManifoldsJohannes Graeter, Tobias Strauss, Martin Lauer
Knowledge about the location of a vehicle is indispensable for autonomous driving. In order to apply global localisation methods, a pose prior must be known which can be obtained from visual odometry. The quality and robustness of that prior determine the success of localisation. Momo is a monocular frame-to-frame motion estimation methodology providing a high quality visual odometry for that purpose. By taking into account the motion model of the vehicle, reliability and accuracy of the pose prior are significantly improved. We show that especially in low-structure environments Momo outperforms the state of the art. Moreover, the method is designed so that multiple cameras with or without overlap can be integrated. The evaluation on the KITTI-dataset and on a proper multi-camera dataset shows that even with only 100--300 feature matches the prior is estimated with high accuracy and in real-time.
CVJun 19, 2017
Pedestrian Prediction by Planning using Deep Neural NetworksEike Rehder, Florian Wirth, Martin Lauer et al.
Accurate traffic participant prediction is the prerequisite for collision avoidance of autonomous vehicles. In this work, we predict pedestrians by emulating their own motion planning. From online observations, we infer a mixture density function for possible destinations. We use this result as the goal states of a planning stage that performs motion prediction based on common behavior patterns. The entire system is modeled as one monolithic neural network and trained via inverse reinforcement learning. Experimental validation on real world data shows the system's ability to predict both, destinations and trajectories accurately.