CV LG RONov 23, 2022

Learning to Imitate Object Interactions from Internet Videos

Austin Patel, Andrew Wang, Ilija Radosavovic, Jitendra Malik

arXiv:2211.13225v117.937 citationsh-index: 63

Originality Incremental advance

AI Analysis

This addresses the challenge of learning robotic manipulation skills from unstructured video data, which is incremental as it builds on existing reconstruction and imitation learning techniques.

The paper tackles the problem of imitating object interactions from Internet videos by developing a method to reconstruct 4D hand-object trajectories and using reinforcement learning in a physics simulator, successfully applying it to 100 videos and enabling imitation with different robotic embodiments.

We study the problem of imitating object interactions from Internet videos. This requires understanding the hand-object interactions in 4D, spatially in 3D and over time, which is challenging due to mutual hand-object occlusions. In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning. We apply our reconstruction technique to 100 challenging Internet videos. We further show that we can successfully imitate a range of different object interactions in a physics simulator. Our object-centric approach is not limited to human-like end-effectors and can learn to imitate object interactions using different embodiments, like a robotic arm with a parallel jaw gripper.

View on arXiv PDF

Similar