C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement Learning
This addresses the problem of generalizing motion retargeting across heterogeneous robots for applications in robotics, though it appears incremental as it builds on existing reinforcement learning and optimization techniques.
The paper tackles motion retargeting between humans and robots with different kinematics by developing a cyclic three-phase optimization method based on deep reinforcement learning, successfully learning skills for multiple robots like NAO and Pepper.
Motion retargeting between heterogeneous polymorphs with different sizes and kinematic configurations requires a comprehensive knowledge of (inverse) kinematics. Moreover, it is non-trivial to provide a kinematic independent general solution. In this study, we developed a cyclic three-phase optimization method based on deep reinforcement learning for human-robot motion retargeting. The motion retargeting learning is performed using refined data in a latent space by the cyclic and filtering paths of our method. In addition, the human-in-the-loop based three-phase approach provides a framework for the improvement of the motion retargeting policy by both quantitative and qualitative manners. Using the proposed C-3PO method, we were successfully able to learn the motion retargeting skill between the human skeleton and motion of the multiple robots such as NAO, Pepper, Baxter and C-3PO.