32.5CVMay 29
4D Radar Meets LiDAR and Camera: Cooperative Perception under Adverse WeatherMelih Yazgan, Iramm Hamdard, Qiyuan Wu et al.
Cooperative perception is important for autonomous driving but remains fragile when cameras and LiDAR degrade in adverse weather. We address this challenge by integrating 4D imaging radar as a weather-robust modality into collaborative perception and introducing a Doppler-guided spatial attention mechanism for multi-agent fusion. Our approach extends two representative backbones: a radar-camera pipeline where radar substitutes LiDAR, and a LiDAR-radar pipeline where radar complements LiDAR. To support evaluation, we release radar-augmented benchmarks, OPV2V-R and Adver-City-R, with physics-based LiDAR degradation. Experiments show strong robustness gains in fog and rain, including substantial improvements when radar replaces degraded LiDAR. Additional validation on MAN TruckScenes demonstrates transfer beyond simulation. Overall, our results highlight 4D imaging radar as a robust modality for all-weather collaborative perception. Dataset and code are available at: https://url.fzi.de/SlimComm.
LGNov 20, 2023
MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric RepresentationsDaniel Bogdoll, Yitian Yang, Tim Joseph et al.
World models for autonomous driving have the potential to dramatically improve the reasoning capabilities of today's systems. However, most works focus on camera data, with only a few that leverage lidar data or combine both to better represent autonomous vehicle sensor setups. In addition, raw sensor predictions are less actionable than 3D occupancy predictions, but there are no works examining the effects of combining both multimodal sensor data and 3D occupancy prediction. In this work, we perform a set of experiments with a MUltimodal World Model with Geometric VOxel representations (MUVO) to evaluate different sensor fusion strategies to better understand the effects on sensor data prediction. We also analyze potential weaknesses of current sensor fusion approaches and examine the benefits of additionally predicting 3D occupancy.
CVJun 5, 2025Code
Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light DetectorsSvetlana Pavlitska, Jamie Robb, Nikolai Polley et al.
Realistic adversarial attacks on various camera-based perception tasks of autonomous vehicles have been successfully demonstrated so far. However, only a few works considered attacks on traffic light detectors. This work shows how CNNs for traffic light detection can be attacked with printed patches. We propose a threat model, where each instance of a traffic light is attacked with a patch placed under it, and describe a training strategy. We demonstrate successful adversarial patch attacks in universal settings. Our experiments show realistic targeted red-to-green label-flipping attacks and attacks on pictogram classification. Finally, we perform a real-world evaluation with printed patches and demonstrate attacks in the lab settings with a mobile traffic light for construction sites and in a test area with stationary traffic lights. Our code is available at https://github.com/KASTEL-MobilityLab/attacks-on-traffic-light-detection.
CVApr 22, 2024
Collaborative Perception Datasets in Autonomous Driving: A SurveyMelih Yazgan, Mythra Varun Akkanapragada, J. Marius Zoellner
This survey offers a comprehensive examination of collaborative perception datasets in the context of Vehicle-to-Infrastructure (V2I), Vehicle-to-Vehicle (V2V), and Vehicle-to-Everything (V2X). It highlights the latest developments in large-scale benchmarks that accelerate advancements in perception tasks for autonomous vehicles. The paper systematically analyzes a variety of datasets, comparing them based on aspects such as diversity, sensor setup, quality, public availability, and their applicability to downstream tasks. It also highlights the key challenges such as domain shift, sensor setup limitations, and gaps in dataset diversity and availability. The importance of addressing privacy and security concerns in the development of datasets is emphasized, regarding data sharing and dataset creation. The conclusion underscores the necessity for comprehensive, globally accessible datasets and collaborative efforts from both technological and research communities to overcome these challenges and fully harness the potential of autonomous driving.
CVApr 24, 2024
A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World ChallengesMelih Yazgan, Thomas Graf, Min Liu et al.
This survey analyzes intermediate fusion methods in collaborative perception for autonomous driving, categorized by real-world challenges. We examine various methods, detailing their features and the evaluation metrics they employ. The focus is on addressing challenges like transmission efficiency, localization errors, communication disruptions, and heterogeneity. Moreover, we explore strategies to counter adversarial attacks and defenses, as well as approaches to adapt to domain shifts. The objective is to present an overview of how intermediate fusion methods effectively meet these diverse challenges, highlighting their role in advancing the field of collaborative perception in autonomous driving.
CVAug 18, 2025
SlimComm: Doppler-Guided Sparse Queries for Bandwidth-Efficient Cooperative 3-D PerceptionMelih Yazgan, Qiyuan Wu, Iramm Hamdard et al.
Collaborative perception allows connected autonomous vehicles (CAVs) to overcome occlusion and limited sensor range by sharing intermediate features. Yet transmitting dense Bird's-Eye-View (BEV) feature maps can overwhelm the bandwidth available for inter-vehicle communication. We present SlimComm, a communication-efficient framework that integrates 4D radar Doppler with a query-driven sparse scheme. SlimComm builds a motion-centric dynamic map to distinguish moving from static objects and generates two query types: (i) reference queries on dynamic and high-confidence regions, and (ii) exploratory queries probing occluded areas via a two-stage offset. Only query-specific BEV features are exchanged and fused through multi-scale gated deformable attention, reducing payload while preserving accuracy. For evaluation, we release OPV2V-R and Adver-City-R, CARLA-based datasets with per-point Doppler radar. SlimComm achieves up to 90% lower bandwidth than full-map sharing while matching or surpassing prior baselines across varied traffic densities and occlusions. Dataset and code will be available at: https://url.fzi.de/SlimComm.
CVJul 25, 2025
EffiComm: Bandwidth Efficient Multi Agent CommunicationMelih Yazgan, Allen Xavier Arasan, J. Marius Zöllner
Collaborative perception allows connected vehicles to exchange sensor information and overcome each vehicle's blind spots. Yet transmitting raw point clouds or full feature maps overwhelms Vehicle-to-Vehicle (V2V) communications, causing latency and scalability problems. We introduce EffiComm, an end-to-end framework that transmits less than 40% of the data required by prior art while maintaining state-of-the-art 3D object detection accuracy. EffiComm operates on Bird's-Eye-View (BEV) feature maps from any modality and applies a two-stage reduction pipeline: (1) Selective Transmission (ST) prunes low-utility regions with a confidence mask; (2) Adaptive Grid Reduction (AGR) uses a Graph Neural Network (GNN) to assign vehicle-specific keep ratios according to role and network load. The remaining features are fused with a soft-gated Mixture-of-Experts (MoE) attention layer, offering greater capacity and specialization for effective feature integration. On the OPV2V benchmark, EffiComm reaches 0.84 mAP@0.7 while sending only an average of approximately 1.5 MB per frame, outperforming previous methods on the accuracy-per-bit curve. These results highlight the value of adaptive, learned communication for scalable Vehicle-to-Everything (V2X) perception.