Jakob Thumm

LG
h-index2
5papers
113citations
Novelty48%
AI Score43

5 Papers

ROMay 12, 2022
Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments

Jakob Thumm, Matthias Althoff

Deep reinforcement learning (RL) has shown promising results in the motion planning of manipulators. However, no method guarantees the safety of highly dynamic obstacles, such as humans, in RL-based manipulator control. This lack of formal safety assurances prevents the application of RL for manipulators in real-world human environments. Therefore, we propose a shielding mechanism that ensures ISO-verified human safety while training and deploying RL algorithms on manipulators. We utilize a fast reachability analysis of humans and manipulators to guarantee that the manipulator comes to a complete stop before a human is within its range. Our proposed method guarantees safety and significantly improves the RL performance by preventing episode-ending collisions. We demonstrate the performance of our proposed method in simulation using human motion capture data.

LGMay 13, 2022
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking

Hanna Krasowski, Jakob Thumm, Marlon Müller et al.

Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.

85.9ROApr 16
Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees

Jakob Thumm, Marian Frei, Tianle Ni et al.

We propose a framework for vision-based human pose estimation and motion prediction that gives conformal prediction guarantees for certifiably safe human-robot collaboration. Our framework combines aleatoric uncertainty estimation with OOD detection for high probabilistic confidence. To integrate our pipeline in certifiable safety frameworks, we propose conformal prediction sets for human motion predictions with high, valid confidence. We evaluate our pipeline on recorded human motion data and a real-world human-robot collaboration setting.

MLFeb 20, 2025
Multi-Objective Causal Bayesian Optimization

Shriya Bhatija, Paul-David Zuercher, Jakob Thumm et al.

In decision-making problems, the outcome of an intervention often depends on the causal relationships between system components and is highly costly to evaluate. In such settings, causal Bayesian optimization (CBO) can exploit the causal relationships between the system variables and sequentially perform interventions to approach the optimum with minimal data. Extending CBO to the multi-outcome setting, we propose Multi-Objective Causal Bayesian Optimization (MO-CBO), a paradigm for identifying Pareto-optimal interventions within a known multi-target causal graph. We first derive a graphical characterization for potentially optimal sets of variables to intervene upon. Showing that any MO-CBO problem can be decomposed into several traditional multi-objective optimization tasks, we then introduce an algorithm that sequentially balances exploration across these tasks using relative hypervolume improvement. The proposed method will be validated on both synthetic and real-world causal graphs, demonstrating its superiority over traditional (non-causal) multi-objective Bayesian optimization in settings where causal information is available.

LGJun 6, 2024
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking

Roland Stolz, Hanna Krasowski, Jakob Thumm et al.

Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using proximal policy optimization (PPO), we evaluate our methods on four control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.