RO HC LGApr 10, 2023

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Zihan Ding, Yuanpei Chen, Allen Z. Ren, Shixiang Shane Gu, Qianxu Wang, Hao Dong, Chi Jin

Baidu

arXiv:2304.04602v210.311 citationsh-index: 46

Originality Incremental advance

AI Analysis

This addresses the problem of unnatural robot motions in high-dimensional control tasks for robotics and AI, though it is incremental as it builds on existing RL from Human Feedback methods.

The researchers tackled the challenge of generating human-like robot behavior in dexterous manipulation tasks by learning a universal human prior from human preference feedback over videos, without demonstrations, and applied it to 20 dual-hand robot manipulation tasks in simulation, resulting in more human-like behaviors even on unseen tasks.

Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Scripting policies from scratch is intractable due to the high-dimensional control space, and training policies with reinforcement learning (RL) and manual reward engineering can also be hard and lead to unnatural motions. Leveraging the recent progress on RL from Human Feedback, we propose a framework that learns a universal human prior using direct human preference feedback over videos, for efficiently tuning the RL policies on 20 dual-hand robot manipulation tasks in simulation, without a single human demonstration. A task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories; it is then applied for regularizing the behavior of polices in the fine-tuning stage. Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks, indicating its generalization capability.

View on arXiv PDF

Similar