CVApr 13, 2023Code
RadarGNN: Transformation Invariant Graph Neural Network for Radar-based PerceptionFelix Fent, Philipp Bauerschmidt, Markus Lienkamp
A reliable perception has to be robust against challenging environmental conditions. Therefore, recent efforts focused on the use of radar sensors in addition to camera and lidar sensors for perception applications. However, the sparsity of radar point clouds and the poor data availability remain challenging for current perception methods. To address these challenges, a novel graph neural network is proposed that does not just use the information of the points themselves but also the relationships between the points. The model is designed to consider both point features and point-pair features, embedded in the edges of the graph. Furthermore, a general approach for achieving transformation invariance is proposed which is robust against unseen scenarios and also counteracts the limited data availability. The transformation invariance is achieved by an invariant data representation rather than an invariant model architecture, making it applicable to other methods. The proposed RadarGNN model outperforms all previous methods on the RadarScenes dataset. In addition, the effects of different invariances on the object detection and semantic segmentation quality are investigated. The code is made available as open-source software under https://github.com/TUMFTM/RadarGNN.
CVJul 10, 2024
MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse conditionsFelix Fent, Fabian Kuttenreich, Florian Ruch et al.
Autonomous trucking is a promising technology that can greatly impact modern logistics and the environment. Ensuring its safety on public roads is one of the main duties that requires an accurate perception of the environment. To achieve this, machine learning methods rely on large datasets, but to this day, no such datasets are available for autonomous trucks. In this work, we present MAN TruckScenes, the first multimodal dataset for autonomous trucking. MAN TruckScenes allows the research community to come into contact with truck-specific challenges, such as trailer occlusions, novel sensor perspectives, and terminal environments for the first time. It comprises more than 740 scenes of 20s each within a multitude of different environmental conditions. The sensor set includes 4 cameras, 6 lidar, 6 radar sensors, 2 IMUs, and a high-precision GNSS. The dataset's 3D bounding boxes were manually annotated and carefully reviewed to achieve a high quality standard. Bounding boxes are available for 27 object classes, 15 attributes, and a range of more than 230m. The scenes are tagged according to 34 distinct scene tags, and all objects are tracked throughout the scene to promote a wide range of applications. Additionally, MAN TruckScenes is the first dataset to provide 4D radar data with 360° coverage and is thereby the largest radar dataset with annotated 3D bounding boxes. Finally, we provide extensive dataset analysis and baseline results. The dataset, development kit, and more are available online.
LGAug 9, 2023
Evaluating Pedestrian Trajectory Prediction Methods with Respect to Autonomous DrivingNico Uhlemann, Felix Fent, Markus Lienkamp
In this paper, we assess the state of the art in pedestrian trajectory prediction within the context of generating single trajectories, a critical aspect aligning with the requirements in autonomous systems. The evaluation is conducted on the widely-used ETH/UCY dataset where the Average Displacement Error (ADE) and the Final Displacement Error (FDE) are reported. Alongside this, we perform an ablation study to investigate the impact of the observed motion history on prediction performance. To evaluate the scalability of each approach when confronted with varying amounts of agents, the inference time of each model is measured. Following a quantitative analysis, the resulting predictions are compared in a qualitative manner, giving insight into the strengths and weaknesses of current approaches. The results demonstrate that although a constant velocity model (CVM) provides a good approximation of the overall dynamics in the majority of cases, additional features need to be incorporated to reflect common pedestrian behavior observed. Therefore, this study presents a data-driven analysis with the intent to guide the future development of pedestrian trajectory prediction algorithms.
CVApr 3, 2024Code
DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object DetectionFelix Fent, Andras Palffy, Holger Caesar
The perception of autonomous vehicles has to be efficient, robust, and cost-effective. However, cameras are not robust against severe weather conditions, lidar sensors are expensive, and the performance of radar-based perception is still inferior to the others. Camera-radar fusion methods have been proposed to address this issue, but these are constrained by the typical sparsity of radar point clouds and often designed for radars without elevation information. We propose a novel camera-radar fusion approach called Dual Perspective Fusion Transformer (DPFT), designed to overcome these limitations. Our method leverages lower-level radar data (the radar cube) instead of the processed point clouds to preserve as much information as possible and employs projections in both the camera and ground planes to effectively use radars with elevation information and simplify the fusion with camera data. As a result, DPFT has demonstrated state-of-the-art performance on the K-Radar dataset while showing remarkable robustness against adverse weather conditions and maintaining a low inference time. The code is made available as open-source software under https://github.com/TUMFTM/DPFT.
CVNov 29, 2024Code
SpaRC: Sparse Radar-Camera Fusion for 3D Object DetectionPhilipp Wolters, Johannes Gilg, Torben Teepe et al.
In this work, we present SpaRC, a novel Sparse fusion transformer for 3D perception that integrates multi-view image semantics with Radar and Camera point features. The fusion of radar and camera modalities has emerged as an efficient perception paradigm for autonomous driving systems. While conventional approaches utilize dense Bird's Eye View (BEV)-based architectures for depth estimation, contemporary query-based transformers excel in camera-only detection through object-centric methodology. However, these query-based approaches exhibit limitations in false positive detections and localization precision due to implicit depth modeling. We address these challenges through three key contributions: (1) sparse frustum fusion (SFF) for cross-modal feature alignment, (2) range-adaptive radar aggregation (RAR) for precise object localization, and (3) local self-attention (LSA) for focused query aggregation. In contrast to existing methods requiring computationally intensive BEV-grid rendering, SpaRC operates directly on encoded point features, yielding substantial improvements in efficiency and accuracy. Empirical evaluations on the nuScenes and TruckScenes benchmarks demonstrate that SpaRC significantly outperforms existing dense BEV-based and sparse query-based detectors. Our method achieves state-of-the-art performance metrics of 67.1 NDS and 63.1 AMOTA. The code and pretrained models are available at https://github.com/phi-wol/sparc.
CVAug 28, 2025Code
To New Beginnings: A Survey of Unified Perception in Autonomous Vehicle SoftwareLoïc Stratil, Felix Fent, Esteban Rivera et al.
Autonomous vehicle perception typically relies on modular pipelines that decompose the task into detection, tracking, and prediction. While interpretable, these pipelines suffer from error accumulation and limited inter-task synergy. Unified perception has emerged as a promising paradigm that integrates these sub-tasks within a shared architecture, potentially improving robustness, contextual reasoning, and efficiency while retaining interpretable outputs. In this survey, we provide a comprehensive overview of unified perception, introducing a holistic and systemic taxonomy that categorizes methods along task integration, tracking formulation, and representation flow. We define three paradigms -Early, Late, and Full Unified Perception- and systematically review existing methods, their architectures, training strategies, datasets used, and open-source availability, while highlighting future research directions. This work establishes the first comprehensive framework for understanding and advancing unified perception, consolidates fragmented efforts, and guides future research toward more robust, generalizable, and interpretable perception.
CVMar 31, 2025Code
Cal or No Cal? -- Real-Time Miscalibration Detection of LiDAR and Camera SensorsIlir Tahiraj, Jeremialie Swadiryus, Felix Fent et al.
The goal of extrinsic calibration is the alignment of sensor data to ensure an accurate representation of the surroundings and enable sensor fusion applications. From a safety perspective, sensor calibration is a key enabler of autonomous driving. In the current state of the art, a trend from target-based offline calibration towards targetless online calibration can be observed. However, online calibration is subject to strict real-time and resource constraints which are not met by state-of-the-art methods. This is mainly due to the high number of parameters to estimate, the reliance on geometric features, or the dependence on specific vehicle maneuvers. To meet these requirements and ensure the vehicle's safety at any time, we propose a miscalibration detection framework that shifts the focus from the direct regression of calibration parameters to a binary classification of the calibration state, i.e., calibrated or miscalibrated. Therefore, we propose a contrastive learning approach that compares embedded features in a latent space to classify the calibration state of two different sensor modalities. Moreover, we provide a comprehensive analysis of the feature embeddings and challenging calibration errors that highlight the performance of our approach. As a result, our method outperforms the current state-of-the-art in terms of detection performance, inference time, and resource demand. The code is open source and available on https://github.com/TUMFTM/MiscalibrationDetection.
ROFeb 8, 2022
Indy Autonomous Challenge -- Autonomous Race Cars at the Handling LimitsAlexander Wischnewski, Maximilian Geisslinger, Johannes Betz et al.
Motorsport has always been an enabler for technological advancement, and the same applies to the autonomous driving industry. The team TUM Auton-omous Motorsports will participate in the Indy Autonomous Challenge in Octo-ber 2021 to benchmark its self-driving software-stack by racing one out of ten autonomous Dallara AV-21 racecars at the Indianapolis Motor Speedway. The first part of this paper explains the reasons for entering an autonomous vehicle race from an academic perspective: It allows focusing on several edge cases en-countered by autonomous vehicles, such as challenging evasion maneuvers and unstructured scenarios. At the same time, it is inherently safe due to the motor-sport related track safety precautions. It is therefore an ideal testing ground for the development of autonomous driving algorithms capable of mastering the most challenging and rare situations. In addition, we provide insight into our soft-ware development workflow and present our Hardware-in-the-Loop simulation setup. It is capable of running simulations of up to eight autonomous vehicles in real time. The second part of the paper gives a high-level overview of the soft-ware architecture and covers our development priorities in building a high-per-formance autonomous racing software: maximum sensor detection range, relia-ble handling of multi-vehicle situations, as well as reliable motion control under uncertainty.