Steffen Müller

CV
h-index4
8papers
30citations
Novelty55%
AI Score45

8 Papers

ROSep 24, 2023
PanopticNDT: Efficient and Robust Panoptic Mapping

Daniel Seichter, Benedict Stephan, Söhnke Benedikt Fischedick et al.

As the application scenarios of mobile robots are getting more complex and challenging, scene understanding becomes increasingly crucial. A mobile robot that is supposed to operate autonomously in indoor environments must have precise knowledge about what objects are present, where they are, what their spatial extent is, and how they can be reached; i.e., information about free space is also crucial. Panoptic mapping is a powerful instrument providing such information. However, building 3D panoptic maps with high spatial resolution is challenging on mobile robots, given their limited computing capabilities. In this paper, we propose PanopticNDT - an efficient and robust panoptic mapping approach based on occupancy normal distribution transform (NDT) mapping. We evaluate our approach on the publicly available datasets Hypersim and ScanNetV2. The results reveal that our approach can represent panoptic information at a higher level of detail than other state-of-the-art approaches while enabling real-time panoptic mapping on mobile robots. Finally, we prove the real-world applicability of PanopticNDT with qualitative results in a domestic application.

CVMar 28, 2024Code
Situation Awareness for Driver-Centric Driving Style Adaptation

Johann Haselberger, Bonifaz Stuhr, Bernhard Schick et al.

There is evidence that the driving style of an autonomous vehicle is important to increase the acceptance and trust of the passengers. The driving situation has been found to have a significant influence on human driving behavior. However, current driving style models only partially incorporate driving environment information, limiting the alignment between an agent and the given situation. Therefore, we propose a situation-aware driving style model based on different visual feature encoders pretrained on fleet data, as well as driving behavior predictors, which are adapted to the driving style of a specific driver. Our experiments show that the proposed method outperforms static driving styles significantly and forms plausible situation clusters. Furthermore, we found that feature encoders pretrained on our dataset lead to more precise driving behavior modeling. In contrast, feature encoders pretrained supervised and unsupervised on different data sources lead to more specific situation clusters, which can be utilized to constrain and control the driving style adaptation for specific situations. Moreover, in a real-world setting, where driving style adaptation is happening iteratively, we found the MLP-based behavior predictors achieve good performance initially but suffer from catastrophic forgetting. In contrast, behavior predictors based on situationdependent statistics can learn iteratively from continuous data streams by design. Overall, our experiments show that important information for driving behavior prediction is contained within the visual feature encoder. The dataset is publicly available at huggingface.co/datasets/jHaselberger/SADC-Situation-Awareness-for-Driver-Centric-Driving-Style-Adaptation.

CVNov 25, 2025Code
Multi-Context Fusion Transformer for Pedestrian Crossing Intention Prediction in Urban Environments

Yuanzhe Li, Hang Zhong, Steffen Müller

Pedestrian crossing intention prediction is essential for autonomous vehicles to improve pedestrian safety and reduce traffic accidents. However, accurate pedestrian intention prediction in urban environments remains challenging due to the multitude of factors affecting pedestrian behavior. In this paper, we propose a multi-context fusion Transformer (MFT) that leverages diverse numerical contextual attributes across four key dimensions, encompassing pedestrian behavior context, environmental context, pedestrian localization context and vehicle motion context, to enable accurate pedestrian intention prediction. MFT employs a progressive fusion strategy, where mutual intra-context attention enables reciprocal interactions within each context, thereby facilitating feature sequence fusion and yielding a context token as a context-specific representation. This is followed by mutual cross-context attention, which integrates features across contexts with a global CLS token serving as a compact multi-context representation. Finally, guided intra-context attention refines context tokens within each context through directed interactions, while guided cross-context attention strengthens the global CLS token to promote multi-context fusion via guided information propagation, yielding deeper and more efficient integration. Experimental results validate the superiority of MFT over state-of-the-art methods, achieving accuracy rates of 73%, 93%, and 90% on the JAADbeh, JAADall, and PIE datasets, respectively. Extensive ablation studies are further conducted to investigate the effectiveness of the network architecture and contribution of different input context. Our code is open-source: https://github.com/ZhongHang0307/Multi-Context-Fusion-Transformer.

LGFeb 2, 2023
Vectorized Scenario Description and Motion Prediction for Scenario-Based Testing

Max Winkelmann, Constantin Vasconi, Steffen Müller

Automated vehicles (AVs) are tested in diverse scenarios, typically specified by parameters such as velocities, distances, or curve radii. To describe scenarios uniformly independent of such parameters, this paper proposes a vectorized scenario description defined by the road geometry and vehicles' trajectories. Data of this form are generated for three scenarios, merged, and used to train the motion prediction model VectorNet, allowing to predict an AV's trajectory for unseen scenarios. Predicting scenario evaluation metrics, VectorNet partially achieves lower errors than regression models that separately process the three scenarios' data. However, for comprehensive generalization, sufficient variance in the training data must be ensured. Thus, contrary to existing methods, our proposed method can merge diverse scenarios' data and exploit spatial and temporal nuances in the vectorized scenario description. As a result, data from specified test scenarios and real-world scenarios can be compared and combined for (predictive) analyses and scenario selection.

CVNov 25, 2025
ACIT: Attention-Guided Cross-Modal Interaction Transformer for Pedestrian Crossing Intention Prediction

Yuanzhe Li, Steffen Müller

Predicting pedestrian crossing intention is crucial for autonomous vehicles to prevent pedestrian-related collisions. However, effectively extracting and integrating complementary cues from different types of data remains one of the major challenges. This paper proposes an attention-guided cross-modal interaction Transformer (ACIT) for pedestrian crossing intention prediction. ACIT leverages six visual and motion modalities, which are grouped into three interaction pairs: (1) Global semantic map and global optical flow, (2) Local RGB image and local optical flow, and (3) Ego-vehicle speed and pedestrian's bounding box. Within each visual interaction pair, a dual-path attention mechanism enhances salient regions within the primary modality through intra-modal self-attention and facilitates deep interactions with the auxiliary modality (i.e., optical flow) via optical flow-guided attention. Within the motion interaction pair, cross-modal attention is employed to model the cross-modal dynamics, enabling the effective extraction of complementary motion features. Beyond pairwise interactions, a multi-modal feature fusion module further facilitates cross-modal interactions at each time step. Furthermore, a Transformer-based temporal feature aggregation module is introduced to capture sequential dependencies. Experimental results demonstrate that ACIT outperforms state-of-the-art methods, achieving accuracy rates of 70% and 89% on the JAADbeh and JAADall datasets, respectively. Extensive ablation studies are further conducted to investigate the contribution of different modules of ACIT.

CVNov 25, 2025
Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network

Yuanzhe Li, Steffen Müller

Pedestrian crossing intention prediction is essential for the deployment of autonomous vehicles (AVs) in urban environments. Ideal prediction provides AVs with critical environmental cues, thereby reducing the risk of pedestrian-related collisions. However, the prediction task is challenging due to the diverse nature of pedestrian behavior and its dependence on multiple contextual factors. This paper proposes a multimodal fusion network that leverages seven modality features from both visual and motion branches, aiming to effectively extract and integrate complementary cues across different modalities. Specifically, motion and visual features are extracted from the raw inputs using multiple Transformer-based extraction modules. Depth-guided attention module leverages depth information to guide attention towards salient regions in another modality through comprehensive spatial feature interactions. To account for the varying importance of different modalities and frames, modality attention and temporal attention are designed to selectively emphasize informative modalities and effectively capture temporal dependencies. Extensive experiments on the JAAD dataset validate the effectiveness of the proposed network, achieving superior performance compared to the baseline methods.

LGOct 6, 2021
Probabilistic Metamodels for an Efficient Characterization of Complex Driving Scenarios

Max Winkelmann, Mike Kohlhoff, Hadj Hamma Tadjine et al.

To validate the safety of automated vehicles (AV), scenario-based testing aims to systematically describe driving scenarios an AV might encounter. In this process, continuous inputs such as velocities result in an infinite number of possible variations of a scenario. Thus, metamodels are used to perform analyses or to select specific variations for examination. However, despite the safety criticality of AV testing, metamodels are usually seen as a part of an overall approach, and their predictions are not questioned. This paper analyzes the predictive performance of Gaussian processes (GP), deep Gaussian processes, extra-trees, and Bayesian neural networks (BNN), considering four scenarios with 5 to 20 inputs. Building on this, an iterative approach is introduced and evaluated, which allows to efficiently select test cases for common analysis tasks. The results show that regarding predictive performance, the appropriate selection of test cases is more important than the choice of metamodels. However, the choice of metamodels remains crucial: Their great flexibility allows BNNs to benefit from large amounts of data and to model even the most complex scenarios. In contrast, less flexible models like GPs convince with higher reliability. Hence, relevant test cases are best explored using scalable virtual test setups and flexible models. Subsequently, more realistic test setups and more reliable models can be used for targeted testing and validation.

SYSep 28, 2020
Robust Model Predictive Longitudinal Position Tracking Control for an Autonomous Vehicle Based on Multiple Models

André Kempf, Markus Herrmann-Wicklmayr, Steffen Müller

The aim of this work is to control the longitudinal position of an autonomous vehicle with an internal combustion engine. The powertrain has an inherent dead-time characteristic and constraints on physical states apply since the vehicle is neither able to accelerate arbitrarily strong, nor to drive arbitrarily fast. A model predictive controller (MPC) is able to cope with both of the aforementioned system properties. MPC heavily relies on a model and therefore a strategy on how to obtain multiple linear state space prediction models of the nonlinear system via input/output data system identification from acceleration data is given. The models are identified in different regions of the vehicle dynamics in order to obtain more accurate predictions. The still remaining plant-model mismatch can be expressed as an additive disturbance which can be handled through robust control theory. Therefore modifications to the models for applying robust MPC tracking control theory are described. Then a controller which guarantees robust constraint satisfaction and recursive feasibility is designed. As a next step, modifications to apply the controller on multiple models are discussed. In this context, a model switching strategy is provided and theoretical and computational limitations are pointed out. Lastly, simulation results are presented and discussed, including computational load when switching between systems.