ROMar 2, 2021

Learning Robotic Manipulation Tasks via Task Progress based Gaussian Reward and Loss Adjusted Exploration

arXiv:2103.01434v210.417 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of enabling robots to perform complex manipulation tasks more effectively, though it appears incremental as it builds on existing deep reinforcement learning approaches.

The authors tackled the challenge of learning multi-step robotic manipulation tasks in unstructured environments by proposing a model-free deep reinforcement learning method with a Task Progress based Gaussian reward function and Loss Adjusted Exploration policy, achieving state-of-the-art performance in success rate and action efficiency.

Multi-step manipulation tasks in unstructured environments are extremely challenging for a robot to learn. Such tasks interlace high-level reasoning that consists of the expected states that can be attained to achieve an overall task and low-level reasoning that decides what actions will yield these states. We propose a model-free deep reinforcement learning method to learn multi-step manipulation tasks. We introduce a Robotic Manipulation Network (RoManNet), which is a vision-based model architecture, to learn the action-value functions and predict manipulation action candidates. We define a Task Progress based Gaussian (TPG) reward function that computes the reward based on actions that lead to successful motion primitives and progress towards the overall task goal. To balance the ratio of exploration/exploitation, we introduce a Loss Adjusted Exploration (LAE) policy that determines actions from the action candidates according to the Boltzmann distribution of loss estimates. We demonstrate the effectiveness of our approach by training RoManNet to learn several challenging multi-step robotic manipulation tasks in both simulation and real-world. Experimental results show that our method outperforms the existing methods and achieves state-of-the-art performance in terms of success rate and action efficiency. The ablation studies show that TPG and LAE are especially beneficial for tasks like multiple block stacking. Code is available at: https://github.com/skumra/romannet

View on arXiv PDF Code

Similar