CVAug 4, 2023Code
Generation of Realistic Synthetic Raw Radar Data for Automated Driving Applications using Generative Adversarial NetworksEduardo C. Fidelis, Fabio Reway, Herick Y. S. Ribeiro et al.
The main approaches for simulating FMCW radar are based on ray tracing, which is usually computationally intensive and do not account for background noise. This work proposes a faster method for FMCW radar simulation capable of generating synthetic raw radar data using generative adversarial networks (GAN). The code and pre-trained weights are open-source and available on GitHub. This method generates 16 simultaneous chirps, which allows the generated data to be used for the further development of algorithms for processing radar data (filtering and clustering). This can increase the potential for data augmentation, e.g., by generating data in non-existent or safety-critical scenarios that are not reproducible in real life. In this work, the GAN was trained with radar measurements of a motorcycle and used to generate synthetic raw radar data of a motorcycle traveling in a straight line. For generating this data, the distance of the motorcycle and Gaussian noise are used as input to the neural network. The synthetic generated radar chirps were evaluated using the Frechet Inception Distance (FID). Then, the Range-Azimuth (RA) map is calculated twice: first, based on synthetic data using this GAN and, second, based on real data. Based on these RA maps, an algorithm with adaptive threshold and edge detection is used for object detection. The results have shown that the data is realistic in terms of coherent radar reflections of the motorcycle and background noise based on the comparison of chirps, the RA maps and the object detection results. Thus, the proposed method in this work has shown to minimize the simulation-to-reality gap for the generation of radar data.
14.5CVApr 20
Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image ModelsVenkatesh Thirugnana Sambandham, Torsten Schön
Modern text-to-image (T2I) models amplify harmful societal biases, challenging their ethical deployment. We introduce an inference-time method that reliably mitigates social bias while keeping prompt semantics and visual context (background, layout, and style) intact. This ensures context persistency and provides a controllable parameter to adjust mitigation strength, giving practitioners fine-grained control over fairness-coherence trade-offs. Using Embedding Arithmetic, we analyze how bias is structured in the embedding space and correct it without altering model weights, prompts, or datasets. Experiments on FLUX 1.0-Dev and Stable Diffusion 3.5-Large show that the conditional embedding space forms a complex, entangled manifold rather than a grid of disentangled concepts. To rigorously assess semantic preservation beyond the circularity and bias limitations of of CLIP scores, we propose the Concept Coherence Score (CCS). Evaluated against this robust metric, our lightweight, tuning-free method significantly outperforms existing baselines in improving diversity while maintaining high concept coherence, effectively resolving the critical fairness-coherence trade-off. By characterizing how models represent social concepts, we establish geometric understanding of latent space as a principled path toward more transparent, controllable, and fair image generation.
43.8CVApr 15
Operationalizing Fairness in Text-to-Image Models: A Survey of Bias, Fairness Audits and Mitigation StrategiesMegan Smith, Venkatesh Thirugnana Sambandham, Florian Richter et al.
Text-to-Image (T2I) generation models have been widely adopted across various industries, yet are criticized for frequently exhibiting societal stereotypes. While a growing body of research has emerged to evaluate and mitigate these biases, the field at present contends with conceptual ambiguity, for example terms like "bias" and "fairness" are not always clearly distinguished and often lack clear operational definitions. This paper provides a comprehensive systematic review of T2I fairness literature, organizing existing work into a taxonomy of bias types and fairness notions. We critically assess the gap between "target fairness" (normative ideals in T2I outputs) and "threshold fairness" (normative standards with actionable decision rules). Furthermore, we survey the landscape of mitigation strategies, ranging from prompt engineering to diffusion process manipulation. We conclude by proposing a new framework for operationalizing fairness that moves beyond descriptive metrics towards rigorous, target-based testing, offering an approach for more accountable generative AI development.
5.0AIMay 5
Real-Time Evaluation of Autonomous Systems under Adversarial AttacksAdithya Mohan, Xujun Xie, Venkatesh Thirugnana Sambandham et al.
Most evaluations of autonomous driving policies under adversarial conditions are conducted in simulation, due to cost efficiency and the absence of physical risk. However, purely virtual testing fails to capture structural inconsistencies, supervision constraints, and state-representation effects that arise in real-world data and fundamentally shape policy robustness. This work presents an offline trajectory-learning and adversarial robustness evaluation framework grounded in real-world intersection driving data. Within a controlled data contract, we train and compare three trajectory-learning paradigms: Multi-Layer Perceptron (MLP)-based Behavior Cloning (BC), Transformer-based object-tokenized BC, and inverse reinforcement learning (IRL) formulated within a Generative Adversarial Imitation Learning (GAIL) framework. Models are evaluated using Average Displacement Error (ADE) and Final Displacement Error (FDE). Inference-time robustness is assessed by subjecting trained policies to gradient-based adversarial perturbations across multiple intersection scenarios, yielding a structured robustness evaluation matrix. Results show that state-structure design and architectural inductive biases critically influence adversarial stability, leading to markedly different robustness profiles despite comparable nominal prediction accuracy (ADE < 0.08). Inference-time Projected Gradient Descent (PGD) attacks induce final displacement errors of up to approximately 8 meters. The proposed framework establishes a scalable benchmark for studying offline trajectory learning and adversarial robustness in real-world autonomous driving settings.
CVJan 21
DrivIng: A Large-Scale Multimodal Driving Dataset with Full Digital Twin IntegrationDominik Rößle, Xujun Xie, Adithya Mohan et al.
Perception is a cornerstone of autonomous driving, enabling vehicles to understand their surroundings and make safe, reliable decisions. Developing robust perception algorithms requires large-scale, high-quality datasets that cover diverse driving conditions and support thorough evaluation. Existing datasets often lack a high-fidelity digital twin, limiting systematic testing, edge-case simulation, sensor modification, and sim-to-real evaluations. To address this gap, we present DrivIng, a large-scale multimodal dataset with a complete geo-referenced digital twin of a ~18 km route spanning urban, suburban, and highway segments. Our dataset provides continuous recordings from six RGB cameras, one LiDAR, and high-precision ADMA-based localization, captured across day, dusk, and night. All sequences are annotated at 10 Hz with 3D bounding boxes and track IDs across 12 classes, yielding ~1.2 million annotated instances. Alongside the benefits of a digital twin, DrivIng enables a 1-to-1 transfer of real traffic into simulation, preserving agent interactions while enabling realistic and flexible scenario testing. To support reproducible research and robust validation, we benchmark DrivIng with state-of-the-art perception models and publicly release the dataset, digital twin, HD map, and codebase.
LGMar 28, 2025
Machine Learning Models for Soil Parameter Prediction Based on Satellite, Weather, Clay and Yield DataCalvin Kammerlander, Viola Kolb, Marinus Luegmair et al.
Efficient nutrient management and precise fertilization are essential for advancing modern agriculture, particularly in regions striving to optimize crop yields sustainably. The AgroLens project endeavors to address this challenge by develop ing Machine Learning (ML)-based methodologies to predict soil nutrient levels without reliance on laboratory tests. By leveraging state of the art techniques, the project lays a foundation for acionable insights to improve agricultural productivity in resource-constrained areas, such as Africa. The approach begins with the development of a robust European model using the LUCAS Soil dataset and Sentinel-2 satellite imagery to estimate key soil properties, including phosphorus, potassium, nitrogen, and pH levels. This model is then enhanced by integrating supplementary features, such as weather data, harvest rates, and Clay AI-generated embeddings. This report details the methodological framework, data preprocessing strategies, and ML pipelines employed in this project. Advanced algorithms, including Random Forests, Extreme Gradient Boosting (XGBoost), and Fully Connected Neural Networks (FCNN), were implemented and finetuned for precise nutrient prediction. Results showcase robust model performance, with root mean square error values meeting stringent accuracy thresholds. By establishing a reproducible and scalable pipeline for soil nutrient prediction, this research paves the way for transformative agricultural applications, including precision fertilization and improved resource allocation in underresourced regions like Africa.
CVOct 27, 2025
UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative PerceptionKarthikeyan Chandra Sekaran, Markus Geisler, Dominik Rößle et al.
Recent cooperative perception datasets have played a crucial role in advancing smart mobility applications by enabling information exchange between intelligent agents, helping to overcome challenges such as occlusions and improving overall scene understanding. While some existing real-world datasets incorporate both vehicle-to-vehicle and vehicle-to-infrastructure interactions, they are typically limited to a single intersection or a single vehicle. A comprehensive perception dataset featuring multiple connected vehicles and infrastructure sensors across several intersections remains unavailable, limiting the benchmarking of algorithms in diverse traffic environments. Consequently, overfitting can occur, and models may demonstrate misleadingly high performance due to similar intersection layouts and traffic participant behavior. To address this gap, we introduce UrbanIng-V2X, the first large-scale, multi-modal dataset supporting cooperative perception involving vehicles and infrastructure sensors deployed across three urban intersections in Ingolstadt, Germany. UrbanIng-V2X consists of 34 temporally aligned and spatially calibrated sensor sequences, each lasting 20 seconds. All sequences contain recordings from one of three intersections, involving two vehicles and up to three infrastructure-mounted sensor poles operating in coordinated scenarios. In total, UrbanIng-V2X provides data from 12 vehicle-mounted RGB cameras, 2 vehicle LiDARs, 17 infrastructure thermal cameras, and 12 infrastructure LiDARs. All sequences are annotated at a frequency of 10 Hz with 3D bounding boxes spanning 13 object classes, resulting in approximately 712k annotated instances across the dataset. We provide comprehensive evaluations using state-of-the-art cooperative perception methods and publicly release the codebase, dataset, HD map, and a digital twin of the complete data collection environment.
LGJul 22, 2025
Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense ApproachAdithya Mohan, Dominik Rößle, Daniel Cremers et al.
Recent advancements in Deep Reinforcement Learning (DRL) have demonstrated its applicability across various domains, including robotics, healthcare, energy optimization, and autonomous driving. However, a critical question remains: How robust are DRL models when exposed to adversarial attacks? While existing defense mechanisms such as adversarial training and distillation enhance the resilience of DRL models, there remains a significant research gap regarding the integration of multiple defenses in autonomous driving scenarios specifically. This paper addresses this gap by proposing a novel ensemble-based defense architecture to mitigate adversarial attacks in autonomous driving. Our evaluation demonstrates that the proposed architecture significantly enhances the robustness of DRL models. Compared to the baseline under FGSM attacks, our ensemble method improves the mean reward from 5.87 to 18.38 (over 213% increase) and reduces the mean collision rate from 0.50 to 0.09 (an 82% decrease) in the highway scenario and merge scenario, outperforming all standalone defense strategies.
CVJan 25, 2024
Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View PredictionDominik Rößle, Jeremias Gerner, Klaus Bogenberger et al.
Accurate and comprehensive semantic segmentation of Bird's Eye View (BEV) is essential for ensuring safe and proactive navigation in autonomous driving. Although cooperative perception has exceeded the detection capabilities of single-agent systems, prevalent camera-based algorithms in cooperative perception neglect valuable information derived from historical observations. This limitation becomes critical during sensor failures or communication issues as cooperative perception reverts to single-agent perception, leading to degraded performance and incomplete BEV segmentation maps. This paper introduces TempCoBEV, a temporal module designed to incorporate historical cues into current observations, thereby improving the quality and reliability of BEV map segmentations. We propose an importance-guided attention architecture to effectively integrate temporal information that prioritizes relevant properties for BEV map segmentation. TempCoBEV is an independent temporal module that seamlessly integrates into state-of-the-art camera-based cooperative perception models. We demonstrate through extensive experiments on the OPV2V dataset that TempCoBEV performs better than non-temporal models in predicting current and future BEV map segmentations, particularly in scenarios involving communication failures. We show the efficacy of TempCoBEV and its capability to integrate historical cues into the current BEV map, improving predictions under optimal communication conditions by up to 2% and under communication failures by up to 19%. The code will be published on GitHub.
CVJul 5, 2021
VolNet: Estimating Human Body Part Volumes from a Single RGB ImageFabian Leinen, Vittorio Cozzolino, Torsten Schön
Human body volume estimation from a single RGB image is a challenging problem despite minimal attention from the research community. However VolNet, an architecture leveraging 2D and 3D pose estimation, body part segmentation and volume regression extracted from a single 2D RGB image combined with the subject's body height can be used to estimate the total body volume. VolNet is designed to predict the 2D and 3D pose as well as the body part segmentation in intermediate tasks. We generated a synthetic, large-scale dataset of photo-realistic images of human bodies with a wide range of body shapes and realistic poses called SURREALvols. By using Volnet and combining multiple stacked hourglass networks together with ResNeXt, our model correctly predicted the volume in ~82% of cases with a 10% tolerance threshold. This is a considerable improvement compared to state-of-the-art solutions such as BodyNet with only a ~38% success rate.
CVJun 4, 2021
Temporally coherent video anonymization through GAN inpaintingThangapavithraa Balaji, Patrick Blies, Georg Göri et al.
This work tackles the problem of temporally coherent face anonymization in natural video streams.We propose JaGAN, a two-stage system starting with detecting and masking out faces with black image patches in all individual frames of the video. The second stage leverages a privacy-preserving Video Generative Adversarial Network designed to inpaint the missing image patches with artificially generated faces. Our initial experiments reveal that image based generative models are not capable of inpainting patches showing temporal coherent appearance across neighboring video frames. To address this issue we introduce a newly curated video collection, which is made publicly available for the research community along with this paper. We also introduce the Identity Invariance Score IdI as a means to quantify temporal coherency between neighboring frames.
LGMar 8, 2021
Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement LearningPatrick Wenzel, Torsten Schön, Laura Leal-Taixé et al.
Obstacle avoidance is a fundamental and challenging problem for autonomous navigation of mobile robots. In this paper, we consider the problem of obstacle avoidance in simple 3D environments where the robot has to solely rely on a single monocular camera. In particular, we are interested in solving this problem without relying on localization, mapping, or planning techniques. Most of the existing work consider obstacle avoidance as two separate problems, namely obstacle detection, and control. Inspired by the recent advantages of deep reinforcement learning in Atari games and understanding highly complex situations in Go, we tackle the obstacle avoidance problem as a data-driven end-to-end deep learning approach. Our approach takes raw images as input and generates control commands as output. We show that discrete action spaces are outperforming continuous control commands in terms of expected average reward in maze-like environments. Furthermore, we show how to accelerate the learning and increase the robustness of the policy by incorporating predicted depth maps by a generative adversarial network.
LGFeb 12, 2019
Towards Self-Supervised High Level Sensor FusionQadeer Khan, Torsten Schön, Patrick Wenzel
In this paper, we present a framework to control a self-driving car by fusing raw information from RGB images and depth maps. A deep neural network architecture is used for mapping the vision and depth information, respectively, to steering commands. This fusion of information from two sensor sources allows to provide redundancy and fault tolerance in the presence of sensor failures. Even if one of the input sensors fails to produce the correct output, the other functioning sensor would still be able to maneuver the car. Such redundancy is crucial in the critical application of self-driving cars. The experimental results have showed that our method is capable of learning to use the relevant sensor information even when one of the sensors fail without any explicit signal.
LGFeb 11, 2019
Semantic Label Reduction Techniques for Autonomous DrivingQadeer Khan, Torsten Schön, Patrick Wenzel
Semantic segmentation maps can be used as input to models for maneuvering the controls of a car. However, not all labels may be necessary for making the control decision. One would expect that certain labels such as road lanes or sidewalks would be more critical in comparison with labels for vegetation or buildings which may not have a direct influence on the car's driving decision. In this appendix, we evaluate and quantify how sensitive and important the different semantic labels are for controlling the car. Labels that do not influence the driving decision are remapped to other classes, thereby simplifying the task by reducing to only labels critical for driving of the vehicle.
LGFeb 11, 2019
Latent Space Reinforcement Learning for Steering Angle PredictionQadeer Khan, Torsten Schön, Patrick Wenzel
Model-free reinforcement learning has recently been shown to successfully learn navigation policies from raw sensor data. In this work, we address the problem of learning driving policies for an autonomous agent in a high-fidelity simulator. Building upon recent research that applies deep reinforcement learning to navigation problems, we present a modular deep reinforcement learning approach to predict the steering angle of the car from raw images. The first module extracts a low-dimensional latent semantic representation of the image. The control module trained with reinforcement learning takes the latent vector as input to predict the correct steering angle. The experimental results have showed that our method is capable of learning to maneuver the car without any human control signals.