Erik Schaffernicht

h-index17

9papers

24citations

Novelty57%

AI Score32

Ranked #125,577 of 194,257 authors (top 65%)#27,613 in LG (top 69%)

9 Papers

9.0AIOct 3, 2023Code

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Finn Rietz, Erik Schaffernicht, Stefan Heinrich et al.

Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition.

4.6LGJul 29, 2024Code

On the Effects of Irrelevant Variables in Treatment Effect Estimation with Deep Disentanglement

Ahmad Saeed Khan, Erik Schaffernicht, Johannes Andreas Stork

Estimating treatment effects from observational data is paramount in healthcare, education, and economics, but current deep disentanglement-based methods to address selection bias are insufficiently handling irrelevant variables. We demonstrate in experiments that this leads to prediction errors. We disentangle pre-treatment variables with a deep embedding method and explicitly identify and represent irrelevant variables, additionally to instrumental, confounding and adjustment latent factors. To this end, we introduce a reconstruction objective and create an embedding space for irrelevant variables using an attached autoencoder. Instead of relying on serendipitous suppression of irrelevant variables as in previous deep disentanglement approaches, we explicitly force irrelevant variables into this embedding space and employ orthogonalization to prevent irrelevant information from leaking into the latent space representations of the other factors. Our experiments with synthetic and real-world benchmark datasets show that we can better identify irrelevant variables and more precisely predict treatment effects than previous methods, while prediction quality degrades less when additional irrelevant variables are introduced.

1.8LGSep 20, 2022

Towards Task-Prioritized Policy Composition

Finn Rietz, Erik Schaffernicht, Todor Stoyanov et al.

Combining learned policies in a prioritized, ordered manner is desirable because it allows for modular design and facilitates data reuse through knowledge transfer. In control theory, prioritized composition is realized by null-space control, where low-priority control actions are projected into the null-space of high-priority control actions. Such a method is currently unavailable for Reinforcement Learning. We propose a novel, task-prioritized composition framework for Reinforcement Learning, which involves a novel concept: The indifferent-space of Reinforcement Learning policies. Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents. Further, our approach can ensure high-priority constraint satisfaction, which makes it promising for learning in safety-critical domains like robotics. Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction.

10.4LGMay 8, 2024Code

DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

Alan A. Lahoud, Erik Schaffernicht, Johannes A. Stork

Learning latent costs of transitions on graphs from trajectories demonstrations under various contextual features is challenging but useful for path planning. Yet, existing methods either oversimplify cost assumptions or scale poorly with the number of observed trajectories. This paper introduces DataSP, a differentiable all-to-all shortest path algorithm to facilitate learning latent costs from trajectories. It allows to learn from a large number of trajectories in each learning step without additional computation. Complex latent cost functions from contextual features can be represented in the algorithm through a neural network approximation. We further propose a method to sample paths from DataSP in order to reconstruct/mimic observed paths' distributions. We prove that the inferred distribution follows the maximum entropy principle. We show that DataSP outperforms state-of-the-art differentiable combinatorial solver and classical machine learning approaches in predicting paths on graphs.

2.6LGJun 5, 2024Code

Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks

Alan A. Lahoud, Erik Schaffernicht, Johannes A. Stork

Mathematical solvers use parametrized Optimization Problems (OPs) as inputs to yield optimal decisions. In many real-world settings, some of these parameters are unknown or uncertain. Recent research focuses on predicting the value of these unknown parameters using available contextual features, aiming to decrease decision regret by adopting end-to-end learning approaches. However, these approaches disregard prediction uncertainty and therefore make the mathematical solver susceptible to provide erroneous decisions in case of low-confidence predictions. We propose a novel framework that models prediction uncertainty with Bayesian Neural Networks (BNNs) and propagates this uncertainty into the mathematical solver with a Stochastic Programming technique. The differentiable nature of BNNs and differentiable mathematical solvers allow for two different learning approaches: In the Decoupled learning approach, we update the BNN weights to increase the quality of the predictions' distribution of the OP parameters, while in the Combined learning approach, we update the weights aiming to directly minimize the expected OP's cost function in a stochastic end-to-end fashion. We do an extensive evaluation using synthetic data with various noise properties and a real dataset, showing that decisions regret are generally lower (better) with both proposed methods.

2.6LGMay 2, 2024

Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies

Finn Rietz, Erik Schaffernicht, Stefan Heinrich et al.

Reinforcement learning policies are typically represented by black-box neural networks, which are non-interpretable and not well-suited for safety-critical domains. To address both of these issues, we propose constrained normalizing flow policies as interpretable and safe-by-construction policy models. We achieve safety for reinforcement learning problems with instantaneous safety constraints, for which we can exploit domain knowledge by analytically constructing a normalizing flow that ensures constraint satisfaction. The normalizing flow corresponds to an interpretable sequence of transformations on action samples, each ensuring alignment with respect to a particular constraint. Our experiments reveal benefits beyond interpretability in an easier learning objective and maintained constraint satisfaction throughout the entire learning process. Our approach leverages constraints over reward engineering while offering enhanced interpretability, safety, and direct means of providing domain knowledge to the agent without relying on complex reward functions.

1.6ROOct 8, 2018

Safe-To-Explore State Spaces: Ensuring Safe Exploration in Policy Search with Hierarchical Task Optimization

Jens Lundell, Robert Krug, Erik Schaffernicht et al.

Policy search reinforcement learning allows robots to acquire skills by themselves. However, the learning procedure is inherently unsafe as the robot has no a-priori way to predict the consequences of the exploratory actions it takes. Therefore, exploration can lead to collisions with the potential to harm the robot and/or the environment. In this work we address the safety aspect by constraining the exploration to happen in safe-to-explore state spaces. These are formed by decomposing target skills (e.g., grasping) into higher ranked sub-tasks (e.g., collision avoidance, joint limit avoidance) and lower ranked movement tasks (e.g., reaching). Sub-tasks are defined as concurrent controllers (policies) in different operational spaces together with associated Jacobians representing their joint-space mapping. Safety is ensured by only learning policies corresponding to lower ranked sub-tasks in the redundant null space of higher ranked ones. As a side benefit, learning in sub-manifolds of the state-space also facilitates sample efficiency. Reaching skills performed in simulation and grasping skills performed on a real robot validate the usefulness of the proposed approach.

4.2ROJan 21, 2018

A Next-Best-Smell Approach for Remote Gas Detection with a Mobile Robot

Riccardo Polvara, Marco Trabattoni, Tomasz Piotr Kucner et al.

The problem of gas detection is relevant to many real-world applications, such as leak detection in industrial settings and landfill monitoring. Using mobile robots for gas detection has several advantages and can reduce danger for humans. In our work, we address the problem of planning a path for a mobile robotic platform equipped with a remote gas sensor, which minimizes the time to detect all gas sources in a given environment. We cast this problem as a coverage planning problem by defining a basic sensing operation -- a scan with the remote gas sensor -- as the field of "view" of the sensor. Given the computing effort required by previously proposed offline approaches, in this paper we suggest a online coverage algorithm, called Next-Best-Smell, adapted from the Next-Best-View class of exploration algorithms. Our algorithm evaluates candidate locations with a global utility function, which combines utility values for travel distance, information gain, and sensing time, using Multi-Criteria Decision Making. In our experiments, conducted both in simulation and with a real robot, we found the performance of the Next-Best-Smell approach to be comparable with that of the state-of-the-art offline algorithm, at much lower computational cost.

5.4ROSep 8, 2016

An Eigenshapes Approach to Compressed Signed Distance Fields and Their Utility in Robot Mapping

Daniel R. Canelhas, Erik Schaffernicht, Todor Stoyanov et al.

In order to deal with the scaling problem of volumetric map representations we propose spatially local methods for high-ratio compression of 3D maps, represented as truncated signed distance fields. We show that these compressed maps can be used as meaningful descriptors for selective decompression in scenarios relevant to robotic applications. As compression methods, we compare using PCA-derived low-dimensional bases to non-linear auto-encoder networks and novel mixed architectures that combine both. Selecting two application-oriented performance metrics, we evaluate the impact of different compression rates on reconstruction fidelity as well as to the task of map-aided ego-motion estimation. It is demonstrated that lossily compressed distance fields used as cost functions for ego-motion estimation, can outperform their uncompressed counterparts in challenging scenarios from standard RGB-D data-sets.