LGJun 2
Post-Hoc Robustness for Model-Based Reinforcement LearningSiemen Herremans, Ali Anwar, Siegfried Mercelis
To improve the real-world applicability of reinforcement learning (RL), the field of adversarially robust RL studies how to train agents under adversarial environment perturbations. In this setting, a protagonist agent optimizes a policy under environmental perturbations from an adversary, resulting in a zero-sum Markov game. When adversarially robust RL is combined with model-based RL, the adversary can target a learned transition model instead of the training environment. Extending this idea, this work introduces post-hoc robustification of deep RL agents at inference time. By using the learned model in combination with a trained nominal policy, our approach performs a robust policy improvement step. The goal is to improve robustness without any additional training of neural networks. Specifically, we utilize model-predictive control under adversarial rollouts, which are approximated via projected gradient descent within a bounded uncertainty set. Furthermore, these offline rollouts are performed while considering and mitigating out-of-distribution issues. The proposed methodology is validated by demonstrating significant improvements in robustness when the algorithm is evaluated in perturbed Gymnasium MuJoCo environments, while considering the computational limitations of the post-hoc inference setting.
RONov 17, 2023
Autonomous Port Navigation With Ranging Sensors Using Model-Based Reinforcement LearningSiemen Herremans, Ali Anwar, Arne Troch et al.
Autonomous shipping has recently gained much interest in the research community. However, little research focuses on inland - and port navigation, even though this is identified by countries such as Belgium and the Netherlands as an essential step towards a sustainable future. These environments pose unique challenges, since they can contain dynamic obstacles that do not broadcast their location, such as small vessels, kayaks or buoys. Therefore, this research proposes a navigational algorithm which can navigate an inland vessel in a wide variety of complex port scenarios using ranging sensors to observe the environment. The proposed methodology is based on a machine learning approach that has recently set benchmark results in various domains: model-based reinforcement learning. By randomizing the port environments during training, the trained model can navigate in scenarios that it never encountered during training. Furthermore, results show that our approach outperforms the commonly used dynamic window approach and a benchmark model-free reinforcement learning algorithm. This work is therefore a significant step towards vessels that can navigate autonomously in complex port scenarios.
ROJun 27, 2025Code
ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle ResearchBavo Lesy, Siemen Herremans, Robin Kerstens et al.
The transport industry has recently shown significant interest in unmanned surface vehicles (USVs), specifically for port and inland waterway transport. These systems can improve operational efficiency and safety, which is especially relevant in the European Union, where initiatives such as the Green Deal are driving a shift towards increased use of inland waterways. At the same time, a shortage of qualified personnel is accelerating the adoption of autonomous solutions. However, there is a notable lack of open-source, high-fidelity simulation frameworks and datasets for developing and evaluating such solutions. To address these challenges, we introduce AirSim For Surface Vehicles (ASVSim), an open-source simulation framework specifically designed for autonomous shipping research in inland and port environments. The framework combines simulated vessel dynamics with marine sensor simulation capabilities, including radar and camera systems and supports the generation of synthetic datasets for training computer vision models and reinforcement learning agents. Built upon Cosys-AirSim, ASVSim provides a comprehensive platform for developing autonomous navigation algorithms and generating synthetic datasets. The simulator supports research of both traditional control methods and deep learning-based approaches. Through limited experiments, we demonstrate the potential of the simulator in these research areas. ASVSim is provided as an open-source project under the MIT license, making autonomous navigation research accessible to a larger part of the ocean engineering community.
LGJun 14, 2024
Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary ModelSiemen Herremans, Ali Anwar, Siegfried Mercelis
Reinforcement learning has demonstrated impressive performance in various challenging problems such as robotics, board games, and classical arcade games. However, its real-world applications can be hindered by the absence of robustness and safety in the learned policies. More specifically, an RL agent that trains in a certain Markov decision process (MDP) often struggles to perform well in nearly identical MDPs. To address this issue, we employ the framework of Robust MDPs (RMDPs) in a model-based setting and introduce a novel learned transition model. Our method specifically incorporates an auxiliary pessimistic model, updated adversarially, to estimate the worst-case MDP within a Kullback-Leibler uncertainty set. In comparison to several existing works, our work does not impose any additional conditions on the training environment, such as the need for a parametric simulator. To test the effectiveness of the proposed pessimistic model in enhancing policy robustness, we integrate it into a practical RL algorithm, called Robust Model-Based Policy Optimization (RMBPO). Our experimental results indicate a notable improvement in policy robustness on high-dimensional MuJoCo control tasks, with the auxiliary model enhancing the performance of the learned policy in distorted MDPs. We further explore the learned deviation between the proposed auxiliary world model and the nominal model, to examine how pessimism is achieved. By learning a pessimistic world model and demonstrating its role in improving policy robustness, our research contributes towards making (model-based) RL more robust.