ROOct 29, 2020

"What, not how": Solving an under-actuated insertion task from scratch

Giulia Vezzani, Michael Neunert, Markus Wulfmeier, Rae Jeong, Thomas Lampe, Noah Siegel, Roland Hafner, Abbas Abdolmaleki, Martin Riedmiller, Francesco Nori

arXiv:2010.15492v24.11 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of enabling robots to autonomously learn complex, multi-skill manipulation tasks, which is incremental as it builds on existing multi-task RL frameworks.

The paper tackles the challenge of teaching a robot to solve a complex peg-in-hole insertion task from scratch, which requires learning and combining multiple manipulation skills without explicit rewards for intermediate steps, and achieves successful task completion in simulation with limited data.

Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as grasping an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete manipulation ('what' to solve) is specified. In contrast, we study a complex manipulation goal that requires an agent to learn and combine diverse manipulation skills. We propose a challenging, highly under-actuated peg-in-hole task with a free, rotational asymmetrical peg, requiring a broad range of manipulation skills. While correct peg (re-)orientation is a requirement for successful insertion, there is no reward associated with it. Hence an agent needs to understand this pre-condition and learn the skill to fulfil it. The final insertion reward is sparse, allowing freedom in the solution and leading to complex emerging behaviour not envisioned during the task design. We tackle the problem in a multi-task RL framework using Scheduled Auxiliary Control (SAC-X) combined with Regularized Hierarchical Policy Optimization (RHPO) which successfully solves the task in simulation and from scratch on a single robot where data is severely limited.

View on arXiv PDF

Similar