Daniel Watzenig

h-index21

8papers

49citations

Novelty40%

AI Score29

Ranked #144,732 of 194,257 authors (top 75%)#47,468 in CV (top 80%)

8 Papers

7.6CVAug 19, 2024Code

Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

Jun Yan, Pengyu Wang, Danni Wang et al.

Semantic segmentation is a significant perception task in autonomous driving. It suffers from the risks of adversarial examples. In the past few years, deep learning has gradually transitioned from convolutional neural network (CNN) models with a relatively small number of parameters to foundation models with a huge number of parameters. The segment-anything model (SAM) is a generalized image segmentation framework that is capable of handling various types of images and is able to recognize and segment arbitrary objects in an image without the need to train on a specific object. It is a unified model that can handle diverse downstream tasks, including semantic segmentation, object detection, and tracking. In the task of semantic segmentation for autonomous driving, it is significant to study the zero-shot adversarial robustness of SAM. Therefore, we deliver a systematic empirical study on the robustness of SAM without additional training. Based on the experimental results, the zero-shot adversarial robustness of the SAM under the black-box corruptions and white-box adversarial attacks is acceptable, even without the need for additional training. The finding of this study is insightful in that the gigantic model parameters and huge amounts of training data lead to the phenomenon of emergence, which builds a guarantee of adversarial robustness. SAM is a vision foundation model that can be regarded as an early prototype of an artificial general intelligence (AGI) pipeline. In such a pipeline, a unified model can handle diverse tasks. Therefore, this research not only inspects the impact of vision foundation models on safe autonomous driving but also provides a perspective on developing trustworthy AGI. The code is available at: https://github.com/momo1986/robust_sam_iv.

2.2ROApr 19, 2024

Random Network Distillation Based Deep Reinforcement Learning for AGV Path Planning

Huilin Yin, Shengkai Su, Yinjia Lin et al.

With the flourishing development of intelligent warehousing systems, the technology of Automated Guided Vehicle (AGV) has experienced rapid growth. Within intelligent warehousing environments, AGV is required to safely and rapidly plan an optimal path in complex and dynamic environments. Most research has studied deep reinforcement learning to address this challenge. However, in the environments with sparse extrinsic rewards, these algorithms often converge slowly, learn inefficiently or fail to reach the target. Random Network Distillation (RND), as an exploration enhancement, can effectively improve the performance of proximal policy optimization, especially enhancing the additional intrinsic rewards of the AGV agent which is in sparse reward environments. Moreover, most of the current research continues to use 2D grid mazes as experimental environments. These environments have insufficient complexity and limited action sets. To solve this limitation, we present simulation environments of AGV path planning with continuous actions and positions for AGVs, so that it can be close to realistic physical scenarios. Based on our experiments and comprehensive analysis of the proposed method, the results demonstrate that our proposed method enables AGV to more rapidly complete path planning tasks with continuous actions in our environments. A video of part of our experiments can be found at https://youtu.be/lwrY9YesGmw.

3.6CVApr 25, 2025

A Data-Centric Approach to 3D Semantic Segmentation of Railway Scenes

Nicolas Münger, Max Peter Ronecker, Xavier Diaz et al.

LiDAR-based semantic segmentation is critical for autonomous trains, requiring accurate predictions across varying distances. This paper introduces two targeted data augmentation methods designed to improve segmentation performance on the railway-specific OSDaR23 dataset. The person instance pasting method enhances segmentation of pedestrians at distant ranges by injecting realistic variations into the dataset. The track sparsification method redistributes point density in LiDAR scans, improving track segmentation at far distances with minimal impact on close-range accuracy. Both methods are evaluated using a state-of-the-art 3D semantic segmentation network, demonstrating significant improvements in distant-range performance while maintaining robustness in close-range predictions. We establish the first 3D semantic segmentation benchmark for OSDaR23, demonstrating the potential of data-centric approaches to address railway-specific challenges in autonomous train perception.

3.6CVApr 25, 2025

LiDAR-Guided Monocular 3D Object Detection for Long-Range Railway Monitoring

Raul David Dominguez Sanchez, Xavier Diaz Ortiz, Xingcheng Zhou et al.

Railway systems, particularly in Germany, require high levels of automation to address legacy infrastructure challenges and increase train traffic safely. A key component of automation is robust long-range perception, essential for early hazard detection, such as obstacles at level crossings or pedestrians on tracks. Unlike automotive systems with braking distances of ~70 meters, trains require perception ranges exceeding 1 km. This paper presents an deep-learning-based approach for long-range 3D object detection tailored for autonomous trains. The method relies solely on monocular images, inspired by the Faraway-Frustum approach, and incorporates LiDAR data during training to improve depth estimation. The proposed pipeline consists of four key modules: (1) a modified YOLOv9 for 2.5D object detection, (2) a depth estimation network, and (3-4) dedicated short- and long-range 3D detection heads. Evaluations on the OSDaR23 dataset demonstrate the effectiveness of the approach in detecting objects up to 250 meters. Results highlight its potential for railway automation and outline areas for future improvement.

4.1LGApr 7, 2025

Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation

Huilin Yin, Zhikun Yang, Linchuan Zhang et al.

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Multi-agent task allocation (MATA) plays a vital role in cooperative multi-agent systems, with significant implications for applications such as logistics, search and rescue, and robotic coordination. Although traditional deep reinforcement learning (DRL) methods have been shown to be promising, their effectiveness is hindered by a reliance on manually designed reward functions and inefficiencies in dynamic environments. In this paper, an inverse reinforcement learning (IRL)-based framework is proposed, in which multi-head self-attention (MHSA) and graph attention mechanisms are incorporated to enhance reward function learning and task execution efficiency. Expert demonstrations are utilized to infer optimal reward densities, allowing dependence on handcrafted designs to be reduced and adaptability to be improved. Extensive experiments validate the superiority of the proposed method over widely used multi-agent reinforcement learning (MARL) algorithms in terms of both cumulative rewards and task execution efficiency.

4.1ROApr 2, 2020

Extraction and Assessment of Naturalistic Human Driving Trajectories from Infrastructure Camera and Radar Sensors

Dominik Notz, Felix Becker, Thomas Kühbeck et al.

Collecting realistic driving trajectories is crucial for training machine learning models that imitate human driving behavior. Most of today's autonomous driving datasets contain only a few trajectories per location and are recorded with test vehicles that are cautiously driven by trained drivers. In particular in interactive scenarios such as highway merges, the test driver's behavior significantly influences other vehicles. This influence prevents recording the whole traffic space of human driving behavior. In this work, we present a novel methodology to extract trajectories of traffic objects using infrastructure sensors. Infrastructure sensors allow us to record a lot of data for one location and take the test drivers out of the loop. We develop both a hardware setup consisting of a camera and a traffic surveillance radar and a trajectory extraction algorithm. Our vision pipeline accurately detects objects, fuses camera and radar detections and tracks them over time. We improve a state-of-the-art object tracker by combining the tracking in image coordinates with a Kalman filter in road coordinates. We show that our sensor fusion approach successfully combines the advantages of camera and radar detections and outperforms either single sensor. Finally, we also evaluate the accuracy of our trajectory extraction pipeline. For that, we equip our test vehicle with a differential GPS sensor and use it to collect ground truth trajectories. With this data we compute the measurement errors. While we use the mean error to de-bias the trajectories, the error standard deviation is in the magnitude of the ground truth data inaccuracy. Hence, the extracted trajectories are not only naturalistic but also highly accurate and prove the potential of using infrastructure sensors to extract real-world trajectories.

8.3ROApr 25, 2019Code

Pedestrian Collision Avoidance System for Scenarios with Occlusions

Markus Schratter, Maxime Bouton, Mykel J. Kochenderfer et al.

Safe autonomous driving in urban areas requires robust algorithms to avoid collisions with other traffic participants with limited perception ability. Current deployed approaches relying on Autonomous Emergency Braking (AEB) systems are often overly conservative. In this work, we formulate the problem as a partially observable Markov decision process (POMDP), to derive a policy robust to uncertainty in the pedestrian location. We investigate how to integrate such a policy with an AEB system that operates only when a collision is unavoidable. In addition, we propose a rigorous evaluation methodology on a set of well defined scenarios. We show that combining the two approaches provides a robust autonomous braking system that reduces unnecessary braking caused by using the AEB system on its own.

2.2LGMay 25, 2018

Safe learning-based optimal motion planning for automated driving

Zlatan Ajanovic, Bakir Lacevic, Georg Stettinger et al.

This paper presents preliminary work on learning the search heuristic for the optimal motion planning for automated driving in urban traffic. Previous work considered search-based optimal motion planning framework (SBOMP) that utilized numerical or model-based heuristics that did not consider dynamic obstacles. Optimal solution was still guaranteed since dynamic obstacles can only increase the cost. However, significant variations in the search efficiency are observed depending whether dynamic obstacles are present or not. This paper introduces machine learning (ML) based heuristic that takes into account dynamic obstacles, thus adding to the performance consistency for achieving real-time implementation.