OCMay 11
Decentralized Contingency MPC based on Safe Sets for Nonlinear Multi-agent Collision AvoidanceMax Studt, Georg Schildbach
Decentralized collision avoidance remains challenging, particularly when agents do not communicate any information related to planned trajectories. Most existing approaches either rely on conservative coordination mechanisms or provide limited guarantees on recursive feasibility and convergence. This paper develops a decentralized contingency MPC framework for multi-agent systems with nonlinear dynamics that achieves collision-free motion under a state-only information pattern. Each agent follows the same consensual rule set, enabling safe decentralized planning without communication. Each agent solves a local optimization problem that couples a nominal trajectory with a contingency certificate ensuring a feasible backup maneuver under receding-horizon operation. A novel geometric and decentralized safe-set update mechanism prevents feasibility loss between consecutive time steps. The resulting scheme guarantees recursive feasibility, including collision avoidance, and establishes a Lyapunov-type convergence result to an admissible safe equilibrium. Simulation results demonstrate performance in both sparse and dense multi-agent environments, including cluttered bottleneck scenarios and under plug-and-play operation.
ROMar 4
Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model FeedbackFabian Domberg, Georg Schildbach
As learning-based robotic controllers are typically trained offline and deployed with fixed parameters, their ability to cope with unforeseen changes during operation is limited. Biologically inspired, this work presents a framework for online Continual Reinforcement Learning that enables automated adaptation during deployment. Building on DreamerV3, a model-based Reinforcement Learning algorithm, the proposed method leverages world model prediction residuals to detect out-of-distribution events and automatically trigger finetuning. Adaptation progress is monitored using both task-level performance signals and internal training metrics, allowing convergence to be assessed without external supervision and domain knowledge. The approach is validated on a variety of contemporary continuous control problems, including a quadruped robot in high-fidelity simulation, and a real-world model vehicle. Relevant metrics and their interpretation are presented and discussed, as well as resulting trade-offs described. The results sketch out how autonomous robotic agents could once move beyond static training regimes toward adaptive systems capable of self-reflection and -improvement during operation, just like their biological counterparts.
RONov 4, 2024
Traffic and Safety Rule Compliance of Humans in Diverse Driving SituationsMichael Kurenkov, Sajad Marvi, Julian Schmidt et al.
The increasing interest in autonomous driving systems has highlighted the need for an in-depth analysis of human driving behavior in diverse scenarios. Analyzing human data is crucial for developing autonomous systems that replicate safe driving practices and ensure seamless integration into human-dominated environments. This paper presents a comparative evaluation of human compliance with traffic and safety rules across multiple trajectory prediction datasets, including Argoverse 2, nuPlan, Lyft, and DeepUrban. By defining and leveraging existing safety and behavior-related metrics, such as time to collision, adherence to speed limits, and interactions with other traffic participants, we aim to provide a comprehensive understanding of each datasets strengths and limitations. Our analysis focuses on the distribution of data samples, identifying noise, outliers, and undesirable behaviors exhibited by human drivers in both the training and validation sets. The results underscore the need for applying robust filtering techniques to certain datasets due to high levels of noise and the presence of such undesirable behaviors.
SYSep 19, 2025
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent ControlMax Studt, Georg Schildbach
Achieving safe and coordinated behavior in dynamic, constraint-rich environments remains a major challenge for learning-based control. Pure end-to-end learning often suffers from poor sample efficiency and limited reliability, while model-based methods depend on predefined references and struggle to generalize. We propose a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) with low-level execution through Model Predictive Control (MPC). For the case of multi-agent systems this means that high-level policies select abstract targets from structured regions of interest (ROIs), while MPC ensures dynamically feasible and safe motion. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL baselines in terms of reward, safety, and consistency, underscoring the benefits of combining structured learning with model-based control.
ROMar 4, 2025
World Models for Anomaly Detection during Model-Based Reinforcement Learning InferenceFabian Domberg, Georg Schildbach
Learning-based controllers are often purposefully kept out of real-world applications due to concerns about their safety and reliability. We explore how state-of-the-art world models in Model-Based Reinforcement Learning can be utilized beyond the training phase to ensure a deployed policy only operates within regions of the state-space it is sufficiently familiar with. This is achieved by continuously monitoring discrepancies between a world model's predictions and observed system behavior during inference. It allows for triggering appropriate measures, such as an emergency stop, once an error threshold is surpassed. This does not require any task-specific knowledge and is thus universally applicable. Simulated experiments on established robot control tasks show the effectiveness of this method, recognizing changes in local robot geometry and global gravitational magnitude. Real-world experiments using an agile quadcopter further demonstrate the benefits of this approach by detecting unexpected forces acting on the vehicle. These results indicate how even in new and adverse conditions, safe and reliable operation of otherwise unpredictable learning-based controllers can be achieved.
ROFeb 23, 2022
Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in IntralogisticsHonghu Xue, Benedikt Hein, Mohamed Bakr et al.
We propose a deep reinforcement learning approach for solving a mapless navigation problem in warehouse scenarios. In our approach, an automation guided vehicle is equipped with LiDAR and frontal RGB sensors and learns to perform a targeted navigation task. The challenges reside in the sparseness of positive samples for learning, multi-modal sensor perception with partial observability, the demand for accurate steering maneuvers together with long training cycles. To address these points, we propose NavACL-Q as a method for automatic curriculum learning in combination with a distributed version of the soft actor-critic algorithm. The performance of the learning algorithm is evaluated exhaustively in an unseen warehouse environment to validate both robustness and generalizability of the learned policy. Results in NVIDIA Isaac Sim demonstrates that our trained agent significantly outperforms a map-based navigation pipeline provided by NVIDIA Isaac Sim with an increased agent-goal distance of 3m and wider initial relative agent-goal rotations of 45 degree. The ablation studies also suggests that NavACL-Q greatly facilitates the learning process with a performance gain of roughly 40% compared to training with random starts and that the utilization of a pre-trained feature extractor manifestly boosts the performance by approximately 60%.
SYApr 12, 2018
On the Application of ISO 26262 in Control Design for Automated VehiclesGeorg Schildbach
Research on automated vehicles has experienced an explosive growth over the past decade. A main obstacle to their practical realization, however, is a convincing safety concept. This question becomes ever more important as more sophisticated algorithms are used and the vehicle automation level increases. The field of functional safety offers a systematic approach to identify possible sources of risk and to improve the safety of a vehicle. It is based on practical experience across the aerospace, process and other industries over multiple decades. This experience is compiled in the functional safety standard for the automotive domain, ISO 26262, which is widely adopted throughout the automotive industry. However, its applicability and relevance for highly automated vehicles is subject to a controversial debate. This paper takes a critical look at the discussion and summarizes the main steps of ISO 26262 for a safe control design for automated vehicles.