ROMar 3, 2023
Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement LearningSimon Guist, Jan Schneider, Alexander Dittrich et al.
Reinforcement learning has shown great potential in solving complex tasks when large amounts of data can be generated with little effort. In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles. However, for tasks that, for instance, involve complex soft robots, devising such models is substantially more challenging. Being able to train effectively in increasingly complicated scenarios with reinforcement learning enables to take advantage of complex systems such as soft robots. Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently. We (i) abstract the task into distinct components, (ii) off-load the simple dynamics parts into the simulation, and (iii) multiply these virtual parts to generate more data in hindsight. Our new method, Hindsight States (HiS), uses this data and selects the most useful transitions for training. It can be used with an arbitrary off-policy algorithm. We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm, Hindsight Experience Replay (HER). Finally, we evaluate HiS on a physical system and show that it boosts performance on a complex table tennis task with a muscular robot. Videos and code of the experiments can be found on webdav.tuebingen.mpg.de/his/.
ROAug 20, 2024
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot HandsYi Zhao, Le Chen, Jan Schneider et al.
It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these methods struggle in a multi-song setting. Our work aims to close this gap and, thereby, enable imitation learning approaches for robot piano playing at scale. To this end, we introduce the Robot Piano 1 Million (RP1M) dataset, containing bi-manual robot piano playing motion data of more than one million trajectories. We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs. Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M.
LGSep 13, 2023
Investigating the Impact of Action Representations in Policy Gradient AlgorithmsJan Schneider, Pierre Schumacher, Daniel Häufle et al.
Reinforcement learning~(RL) is a versatile framework for learning to solve complex real-world tasks. However, influences on the learning performance of RL algorithms are often poorly understood in practice. We discuss different analysis techniques and assess their effectiveness for investigating the impact of action representations in RL. Our experiments demonstrate that the action representation can significantly influence the learning performance on popular RL benchmark tasks. The analysis results indicate that some of the performance differences can be attributed to changes in the complexity of the optimization landscape. Finally, we discuss open challenges of analysis techniques for RL algorithms.
LGJan 12, 2024
Identifying Policy Gradient SubspacesJan Schneider, Pierre Schumacher, Simon Guist et al.
Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised learning can be accelerated by leveraging the fact that gradients lie in a low-dimensional and slowly-changing subspace. In this paper, we conduct a thorough evaluation of this phenomenon for two popular deep policy gradient methods on various simulated benchmark tasks. Our results demonstrate the existence of such gradient subspaces despite the continuously changing data distribution inherent to reinforcement learning. These findings reveal promising directions for future work on more efficient reinforcement learning, e.g., through improving parameter-space exploration or enabling second-order optimization.
39.6ROApr 10
Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator NetworksJan Schneider, Mridul Mahajan, Le Chen et al.
Tendon drives paired with soft muscle actuation enable faster and safer robots while potentially accelerating skill acquisition. Still, these systems are rarely used in practice due to inherent nonlinearities, friction, and hysteresis, which complicate modeling and control. So far, these challenges have hindered policy transfer from simulation to real systems. To bridge this gap, we propose a sim-to-real pipeline that learns a neural network model of this complex actuation and leverages established rigid body simulation for the arm dynamics and interactions with the environment. Our method, called Generalized Actuator Network (GeAN), enables actuation model identification across a wide range of robots by learning directly from joint position trajectories rather than requiring torque sensors. Using GeAN on PAMY2, a tendon-driven robot powered by pneumatic artificial muscles, we successfully deploy precise goal-reaching and dynamic ball-in-a-cup policies trained entirely in simulation. To the best of our knowledge, this result constitutes the first successful sim-to-real transfer for a four-degrees-of-freedom muscle-actuated robot arm.
MLJul 20, 2025
Learning Nonlinear Causal Reductions to Explain Reinforcement Learning PoliciesArmin Kekić, Jan Schneider, Dieter Büchler et al.
Why do reinforcement learning (RL) policies fail or succeed? This is a challenging question due to the complex, high-dimensional nature of agent-environment interactions. In this work, we take a causal perspective on explaining the behavior of RL policies by viewing the states, actions, and rewards as variables in a low-level causal model. We introduce random perturbations to policy actions during execution and observe their effects on the cumulative reward, learning a simplified high-level causal model that explains these relationships. To this end, we develop a nonlinear Causal Model Reduction framework that ensures approximate interventional consistency, meaning the simplified high-level model responds to interventions in a similar way as the original complex system. We prove that for a class of nonlinear causal models, there exists a unique solution that achieves exact interventional consistency, ensuring learned explanations reflect meaningful causal patterns. Experiments on both synthetic causal models and practical RL tasks-including pendulum control and robot table tennis-demonstrate that our approach can uncover important behavioral patterns, biases, and failure modes in trained RL policies.
NAJul 14, 2017
Quantized-CP Approximation and Sparse Tensor Interpolation of Function Generated DataBoris N. Khoromskij, Kishore K. Naraparaju, Jan Schneider
In this article we consider the iterative schemes to compute the canonical (CP) approximation of quantized data generated by a function discretized on a large uniform grid in an interval on the real line. This paper continues the research on the QTT method [16] developed for the tensor train (TT) approximation of the quantized images of function related data. In the QTT approach the target vector of length $2^{L}$ is reshaped to a $L^{th}$ order tensor with two entries in each mode (Quantized representation) and then approximated by the QTT tenor including $2r^2 L$ parameters, where $r$ is the maximal TT rank. In what follows, we consider the Alternating Least-Squares (ALS) iterative scheme to compute the rank-$r$ CP approximation of the quantized vectors, which requires only $2 r L\ll 2^L$ parameters for storage. In the earlier papers [17] such a representation was called Q$_{Can}$ format, while in this paper we abbreviate it as the QCP representation. We test the ALS algorithm to calculate the QCP approximation on various functions, and in all cases we observed the exponential error decay in the QCP rank. The main idea for recovering a discretized function in the rank-$r$ QCP format using the reduced number the functional samples, calculated only at $O(2rL)$ grid points, is presented. The special version of ALS scheme for solving the arising minimization problem is described. This approach can be viewed as the sparse QCP-interpolation method that allows to recover all $2r L$ representation parameters of the rank-$r$ QCP tensor. Numerical examples show the efficiency of the QCP-ALS type iteration and indicate the exponential convergence rate in $r$.
NAOct 21, 2015
Equivalence of anchored and ANOVA spaces via interpolationAicke Hinrichs, Jan Schneider
We consider weighted anchored and ANOVA spaces of functions with first order mixed derivatives bounded in $L_p$. Recently, Hefter, Ritter and Wasilkowski established conditions on the weights in the cases $p=1$ and $p=\infty$ which ensure equivalence of the corresponding norms uniformly in the dimension or only polynomially dependent on the dimension. We extend these results to the whole range of $p\in [1,\infty]$. It is shown how this can be achieved via interpolation.