LGJul 28, 2023Code
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareNico Gürtler, Sebastian Blaes, Pavel Kolev et al.
Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
ROAug 15, 2023
Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real WorldNico Gürtler, Felix Widmaier, Cansu Sancaktar et al. · deepmind
Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore served as a bridge between the RL and robotics communities by allowing participants to experiment remotely with a real robot - as easily as in simulation. In the last years, offline reinforcement learning has matured into a promising paradigm for learning from pre-collected datasets, alleviating the reliance on expensive online interactions. We therefore asked the participants to learn two dexterous manipulation tasks involving pushing, grasping, and in-hand orientation from provided real-robot datasets. An extensive software documentation and an initial stage based on a simulation of the real set-up made the competition particularly accessible. By giving each team plenty of access budget to evaluate their offline-learned policies on a cluster of seven identical real TriFinger platforms, we organized an exciting competition for machine learners and roboticists alike. In this work we state the rules of the competition, present the methods used by the winning teams and compare their results with a benchmark of state-of-the-art offline RL algorithms on the challenge datasets.
ROMay 19, 2022
Dexterous Robotic Manipulation using Deep Reinforcement Learning and Knowledge Transfer for Complex Sparse Reward-based TasksQiang Wang, Francisco Roldan Sanchez, Robert McCarthy et al.
This paper describes a deep reinforcement learning (DRL) approach that won Phase 1 of the Real Robot Challenge (RRC) 2021, and then extends this method to a more difficult manipulation task. The RRC consisted of using a TriFinger robot to manipulate a cube along a specified positional trajectory, but with no requirement for the cube to have any specific orientation. We used a relatively simple reward function, a combination of goal-based sparse reward and distance reward, in conjunction with Hindsight Experience Replay (HER) to guide the learning of the DRL agent (Deep Deterministic Policy Gradient (DDPG)). Our approach allowed our agents to acquire dexterous robotic manipulation strategies in simulation. These strategies were then applied to the real robot and outperformed all other competition submissions, including those using more traditional robotic control techniques, in the final evaluation stage of the RRC. Here we extend this method, by modifying the task of Phase 1 of the RRC to require the robot to maintain the cube in a particular orientation, while the cube is moved along the required positional trajectory. The requirement to also orient the cube makes the agent unable to learn the task through blind exploration due to increased problem complexity. To circumvent this issue, we make novel use of a Knowledge Transfer (KT) technique that allows the strategies learned by the agent in the original task (which was agnostic to cube orientation) to be transferred to this task (where orientation matters). KT allowed the agent to learn and perform the extended task in the simulator, which improved the average positional deviation from 0.134 m to 0.02 m, and average orientation deviation from 142° to 76° during evaluation. This KT concept shows good generalisation properties and could be applied to any actor-critic learning algorithm.
AIMar 21, 2024Code
Language Models Can Reduce Asymmetry in Information MarketsNasim Rahaman, Martin Weiss, Manuel Wüthrich et al.
This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The central mechanism enabling this marketplace is the agents' dual capabilities: they not only have the capacity to assess the quality of privileged information but also come equipped with the ability to forget. This ability to induce amnesia allows vendors to grant temporary access to proprietary information, significantly reducing the risk of unauthorized retention while enabling agents to accurately gauge the information's relevance to specific queries or tasks. To perform well, agents must make rational decisions, strategically explore the marketplace through generated sub-queries, and synthesize answers from purchased information. Concretely, our experiments (a) uncover biases in language models leading to irrational behavior and evaluate techniques to mitigate these biases, (b) investigate how price affects demand in the context of informational goods, and (c) show that inspection and higher budgets both lead to higher quality outcomes.
ROOct 8, 2020Code
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer LearningOssama Ahmed, Frederik Träuble, Anirudh Goyal et al.
Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.
ROAug 8, 2020Code
TriFinger: An Open-Source Robot for Learning DexterityManuel Wüthrich, Felix Widmaier, Felix Grimminger et al.
Dexterous object manipulation remains an open problem in robotics, despite the rapid progress in machine learning during the past decade. We argue that a hindrance is the high cost of experimentation on real systems, in terms of both time and money. We address this problem by proposing an open-source robotic platform which can safely operate without human supervision. The hardware is inexpensive (about \SI{5000}[\$]{}) yet highly dynamic, robust, and capable of complex interaction with external objects. The software operates at 1-kilohertz and performs safety checks to prevent the hardware from breaking. The easy-to-use front-end (in C++ and Python) is suitable for real-time control as well as deep reinforcement learning. In addition, the software framework is largely robot-agnostic and can hence be used independently of the hardware proposed herein. Finally, we illustrate the potential of the proposed platform through a number of experiments, including real-time optimal control, deep reinforcement learning from scratch, throwing, and writing.
ROSep 30, 2019Code
An Open Torque-Controlled Modular Robot Architecture for Legged Locomotion ResearchFelix Grimminger, Avadesh Meduri, Majid Khadiv et al.
We present a new open-source torque-controlled legged robot system, with a low-cost and low-complexity actuator module at its core. It consists of a high-torque brushless DC motor and a low-gear-ratio transmission suitable for impedance and force control. We also present a novel foot contact sensor suitable for legged locomotion with hard impacts. A 2.2 kg quadruped robot with a large range of motion is assembled from eight identical actuator modules and four lower legs with foot contact sensors. Leveraging standard plastic 3D printing and off-the-shelf parts results in a lightweight and inexpensive robot, allowing for rapid distribution and duplication within the research community. We systematically characterize the achieved impedance at the foot in both static and dynamic scenarios, and measure a maximum dimensionless leg stiffness of 10.8 without active damping, which is comparable to the leg stiffness of a running human. Finally, to demonstrate the capabilities of the quadruped, we present a novel controller which combines feedforward contact forces computed from a kino-dynamic optimizer with impedance control of the center of mass and base orientation. The controller can regulate complex motions while being robust to environmental uncertainty.
ROOct 16, 2016Code
Probabilistic Articulated Real-Time Tracking for Robot ManipulationCristina Garcia Cifuentes, Jan Issac, Manuel Wüthrich et al.
We propose a probabilistic filtering method which fuses joint measurements with depth images to yield a precise, real-time estimate of the end-effector pose in the camera frame. This avoids the need for frame transformations when using it in combination with visual object tracking methods. Precision is achieved by modeling and correcting biases in the joint measurements as well as inaccuracies in the robot model, such as poor extrinsic camera calibration. We make our method computationally efficient through a principled combination of Kalman filtering of the joint measurements and asynchronous depth-image updates based on the Coordinate Particle Filter. We quantitatively evaluate our approach on a dataset recorded from a real robotic platform, annotated with ground truth from a motion capture system. We show that our approach is robust and accurate even under challenging conditions such as fast motion, significant and long-term occlusions, and time-varying biases. We release the dataset along with open-source code of our approach to allow for quantitative comparison with alternative approaches.
GTSep 3, 2023
Generative Social ChoiceSara Fish, Paul Gölz, David C. Parkes et al.
The mathematical study of voting, social choice theory, has traditionally only been applicable to choices among a few predetermined alternatives, but not to open-ended decisions such as collectively selecting a textual statement. We introduce generative social choice, a design methodology for open-ended democratic processes that combines the rigor of social choice theory with the capability of large language models to generate text and extrapolate preferences. Our framework divides the design of AI-augmented democratic processes into two components: first, proving that the process satisfies representation guarantees when given access to oracle queries; second, empirically validating that these queries can be approximately implemented using a large language model. We apply this framework to the problem of summarizing free-form opinions into a proportionally representative slate of opinion statements; specifically, we develop a democratic process with representation guarantees and use this process to portray the opinions of participants in a survey about abortion policy. In a trial with 100 representative US residents, we find that 84 out of 100 participants feel "excellently" or "exceptionally" represented by the slate of five statements we extracted.
ROSep 22, 2021
Real Robot Challenge: A Robotics Competition in the CloudStefan Bauer, Felix Widmaier, Manuel Wüthrich et al.
Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able to control the platforms remotely by submitting code that is executed automatically, akin to a computational cluster. Using this setup, i) we host robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks ii) we publish the datasets collected during these competitions (consisting of hundreds of robot hours), and iii) we give researchers access to these platforms for their own projects.
ROAug 22, 2021
Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFingerArthur Allshire, Mayank Mittal, Varun Lodaya et al.
We present a system for learning a challenging dexterous manipulation task involving moving a cube to an arbitrary 6-DoF pose with only 3-fingers trained with NVIDIA's IsaacGym simulator. We show empirical benefits, both in simulation and sim-to-real transfer, of using keypoints as opposed to position+quaternion representations for the object pose in 6-DoF for policy observations and in reward calculation to train a model-free reinforcement learning agent. By utilizing domain randomization strategies along with the keypoint representation of the pose of the manipulated object, we achieve a high success rate of 83% on a remote TriFinger system maintained by the organizers of the Real Robot Challenge. With the aim of assisting further research in learning in-hand manipulation, we make the codebase of our system, along with trained checkpoints that come with billions of steps of experience available, at https://s2r2-ig.github.io
LGJul 12, 2021
The Role of Pretrained Representations for the OOD Generalization of Reinforcement Learning AgentsAndrea Dittadi, Frederik Träuble, Manuel Wüthrich et al.
Building sample-efficient agents that generalize out-of-distribution (OOD) in real-world settings remains a fundamental unsolved problem on the path towards achieving higher-level cognition. One particularly promising approach is to begin with low-dimensional, pretrained representations of our world, which should facilitate efficient downstream learning and generalization. By training 240 representations and over 10,000 reinforcement learning (RL) policies on a simulated robotic setup, we evaluate to what extent different properties of pretrained VAE-based representations affect the OOD generalization of downstream agents. We observe that many agents are surprisingly robust to realistic distribution shifts, including the challenging sim-to-real case. In addition, we find that the generalization performance of a simple downstream proxy task reliably predicts the generalization performance of our RL agents under a wide range of OOD settings. Such proxy tasks can thus be used to select pretrained representations that will lead to agents that generalize.
LGApr 29, 2021
Regret Bounds for Gaussian-Process Optimization in Large DomainsManuel Wüthrich, Bernhard Schölkopf, Andreas Krause
The goal of this paper is to characterize Gaussian-Process optimization in the setting where the function domain is large relative to the number of admissible function evaluations, i.e., where it is impossible to find the global optimum. We provide upper bounds on the suboptimality (Bayesian simple regret) of the solution found by optimization strategies that are closely related to the widely used expected improvement (EI) and upper confidence bound (UCB) algorithms. These regret bounds illuminate the relationship between the number of evaluations, the domain size (i.e. cardinality of finite domains / Lipschitz constant of the covariance function in continuous domains), and the optimality of the retrieved function value. In particular, we show that even when the number of evaluations is far too small to find the global optimum, we can find nontrivial function values (e.g. values that achieve a certain ratio with the optimal value).
LGOct 27, 2020
On the Transfer of Disentangled Representations in Realistic SettingsAndrea Dittadi, Frederik Träuble, Francesco Locatello et al.
Learning meaningful representations that disentangle the underlying structure of the data generating process is considered to be of key importance in machine learning. While disentangled representations were found to be useful for diverse tasks such as abstract reasoning and fair classification, their scalability and real-world impact remain questionable. We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images of the same setup. In contrast to previous work, this new dataset exhibits correlations, a complex underlying structure, and allows to evaluate transfer to unseen simulated and real-world settings where the encoder i) remains in distribution or ii) is out of distribution. We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. We observe that disentanglement is a good predictor for out-of-distribution (OOD) task performance.
LGOct 14, 2020
Function Contrastive Learning of Transferable Meta-RepresentationsMuhammad Waleed Gondal, Shruti Joshi, Nasim Rahaman et al.
Meta-learning algorithms adapt quickly to new tasks that are drawn from the same task distribution as the training tasks. The mechanism leading to fast adaptation is the conditioning of a downstream predictive model on the inferred representation of the task's underlying data generative process, or \emph{function}. This \emph{meta-representation}, which is computed from a few observed examples of the underlying function, is learned jointly with the predictive model. In this work, we study the implications of this joint training on the transferability of the meta-representations. Our goal is to learn meta-representations that are robust to noise in the data and facilitate solving a wide range of downstream tasks that share the same underlying functions. To this end, we propose a decoupled encoder-decoder approach to supervised meta-learning, where the encoder is trained with a contrastive objective to find a good representation of the underlying function. In particular, our training scheme is driven by the self-supervision signal indicating whether two sets of examples stem from the same function. Our experiments on a number of synthetic and real-world datasets show that the representations we obtain outperform strong baselines in terms of downstream performance and noise robustness, even when these baselines are trained in an end-to-end manner.
MLJun 7, 2019
On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement DatasetMuhammad Waleed Gondal, Manuel Wüthrich, Đorđe Miladinović et al.
Learning meaningful and compact representations with disentangled semantic aspects is considered to be of key importance in representation learning. Since real-world data is notoriously costly to collect, many recent state-of-the-art disentanglement models have heavily relied on synthetic toy data-sets. In this paper, we propose a novel data-set which consists of over one million images of physical 3D objects with seven factors of variation, such as object color, shape, size and position. In order to be able to control all the factors of variation precisely, we built an experimental platform where the objects are being moved by a robotic arm. In addition, we provide two more datasets which consist of simulations of the experimental setup. These datasets provide for the first time the possibility to systematically investigate how well different disentanglement methods perform on real data in comparison to simulation, and how simulated data can be leveraged to build better representations of the real world. We provide a first experimental study of these questions and our results indicate that learned models transfer poorly, but that model and hyperparameter selection is an effective means of transferring information to the real world.
ROMar 10, 2017
Real-time Perception meets Reactive Motion GenerationDaniel Kappler, Franziska Meier, Jan Issac et al.
We address the challenging problem of robotic grasping and manipulation in the presence of uncertainty. This uncertainty is due to noisy sensing, inaccurate models and hard-to-predict environment dynamics. We quantify the importance of continuous, real-time perception and its tight integration with reactive motion generation methods in dynamic manipulation scenarios. We compare three different systems that are instantiations of the most common architectures in the field: (i) a traditional sense-plan-act approach that is still widely used, (ii) a myopic controller that only reacts to local environment dynamics and (iii) a reactive planner that integrates feedback control and motion optimization. All architectures rely on the same components for real-time perception and reactive motion generation to allow a quantitative evaluation. We extensively evaluate the systems on a real robotic platform in four scenarios that exhibit either a challenging workspace geometry or a dynamic environment. In 333 experiments, we quantify the robustness and accuracy that is due to integrating real-time feedback at different time scales in a reactive motion generation system. We also report on the lessons learned for system building.
ROFeb 19, 2016
Depth-Based Object Tracking Using a Robust Gaussian FilterJan Issac, Manuel Wüthrich, Cristina Garcia Cifuentes et al.
We consider the problem of model-based 3D-tracking of objects given dense depth images as input. Two difficulties preclude the application of a standard Gaussian filter to this problem. First of all, depth sensors are characterized by fat-tailed measurement noise. To address this issue, we show how a recently published robustification method for Gaussian filters can be applied to the problem at hand. Thereby, we avoid using heuristic outlier detection methods that simply reject measurements if they do not match the model. Secondly, the computational cost of the standard Gaussian filter is prohibitive due to the high-dimensional measurement, i.e. the depth image. To address this problem, we propose an approximation to reduce the computational complexity of the filter. In quantitative experiments on real data we show how our method clearly outperforms the standard Gaussian filter. Furthermore, we compare its performance to a particle-filter-based tracking method, and observe comparable computational efficiency and improved accuracy and smoothness of the estimates.
MLSep 14, 2015
Robust Gaussian Filtering using a Pseudo MeasurementManuel Wüthrich, Cristina Garcia Cifuentes, Sebastian Trimpe et al.
Many sensors, such as range, sonar, radar, GPS and visual devices, produce measurements which are contaminated by outliers. This problem can be addressed by using fat-tailed sensor models, which account for the possibility of outliers. Unfortunately, all estimation algorithms belonging to the family of Gaussian filters (such as the widely-used extended Kalman filter and unscented Kalman filter) are inherently incompatible with such fat-tailed sensor models. The contribution of this paper is to show that any Gaussian filter can be made compatible with fat-tailed sensor models by applying one simple change: Instead of filtering with the physical measurement, we propose to filter with a pseudo measurement obtained by applying a feature function to the physical measurement. We derive such a feature function which is optimal under some conditions. Simulation results show that the proposed method can effectively handle measurement outliers and allows for robust filtering in both linear and nonlinear systems.
ROMay 1, 2015
The Coordinate Particle Filter - A novel Particle Filter for High Dimensional SystemsManuel Wüthrich, Jeannette Bohg, Daniel Kappler et al.
Parametric filters, such as the Extended Kalman Filter and the Unscented Kalman Filter, typically scale well with the dimensionality of the problem, but they are known to fail if the posterior state distribution cannot be closely approximated by a density of the assumed parametric form. For nonparametric filters, such as the Particle Filter, the converse holds. Such methods are able to approximate any posterior, but the computational requirements scale exponentially with the number of dimensions of the state space. In this paper, we present the Coordinate Particle Filter which alleviates this problem. We propose to compute the particle weights recursively, dimension by dimension. This allows us to explore one dimension at a time, and resample after each dimension if necessary. Experimental results on simulated as well as real data confirm that the proposed method has a substantial performance advantage over the Particle Filter in high-dimensional systems where not all dimensions are highly correlated. We demonstrate the benefits of the proposed method for the problem of multi-object and robotic manipulator tracking.
ROMay 1, 2015
Probabilistic Object Tracking using a Range CameraManuel Wüthrich, Peter Pastor, Mrinal Kalakrishnan et al.
We address the problem of tracking the 6-DoF pose of an object while it is being manipulated by a human or a robot. We use a dynamic Bayesian network to perform inference and compute a posterior distribution over the current object pose. Depending on whether a robot or a human manipulates the object, we employ a process model with or without knowledge of control inputs. Observations are obtained from a range camera. As opposed to previous object tracking methods, we explicitly model self-occlusions and occlusions from the environment, e.g, the human or robotic hand. This leads to a strongly non-linear observation model and additional dependencies in the Bayesian network. We employ a Rao-Blackwellised particle filter to compute an estimate of the object pose at every time step. In a set of experiments, we demonstrate the ability of our method to accurately and robustly track the object pose in real-time while it is being manipulated by a human or a robot.
ROApr 29, 2015
A New Perspective and Extension of the Gaussian FilterManuel Wüthrich, Sebastian Trimpe, Daniel Kappler et al.
The Gaussian Filter (GF) is one of the most widely used filtering algorithms; instances are the Extended Kalman Filter, the Unscented Kalman Filter and the Divided Difference Filter. GFs represent the belief of the current state by a Gaussian with the mean being an affine function of the measurement. We show that this representation can be too restrictive to accurately capture the dependences in systems with nonlinear observation models, and we investigate how the GF can be generalized to alleviate this problem. To this end, we view the GF from a variational-inference perspective. We analyse how restrictions on the form of the belief can be relaxed while maintaining simplicity and efficiency. This analysis provides a basis for generalizations of the GF. We propose one such generalization which coincides with a GF using a virtual measurement, obtained by applying a nonlinear function to the actual measurement. Numerical experiments show that the proposed Feature Gaussian Filter (FGF) can have a substantial performance advantage over the standard GF for systems with nonlinear observation models.
ROApr 29, 2015
Probabilistic Depth Image Registration incorporating Nonvisual InformationManuel Wüthrich, Peter Pastor, Ludovic Righetti et al.
In this paper, we derive a probabilistic registration algorithm for object modeling and tracking. In many robotics applications, such as manipulation tasks, nonvisual information about the movement of the object is available, which we will combine with the visual information. Furthermore we do not only consider observations of the object, but we also take space into account which has been observed to not be part of the object. Furthermore we are computing a posterior distribution over the relative alignment and not a point estimate as typically done in for example Iterative Closest Point (ICP). To our knowledge no existing algorithm meets these three conditions and we thus derive a novel registration algorithm in a Bayesian framework. Experimental results suggest that the proposed methods perform favorably in comparison to PCL implementations of feature mapping and ICP, especially if nonvisual information is available.