ROCVLGNov 13, 2020

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos

arXiv:2011.06813v124 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient skill acquisition for robots, though it is incremental as it builds on existing visual recognition and reinforcement learning methods.

The paper tackles the problem of enabling robots to learn object manipulation skills from videos, reducing the need for extensive trial-and-error or expert demonstrations, and demonstrates that policies learned in simulation can be transferred to a real robot.

Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In this paper, we explore a method that facilitates learning object manipulation skills directly from videos. Leveraging recent advances in 2D visual recognition and differentiable rendering, we develop an optimization based method to estimate a coarse 3D state representation for the hand and the manipulated object(s) without requiring any supervision. We use these trajectories as dense rewards for an agent that learns to mimic them through reinforcement learning. We evaluate our method on simple single- and two-object actions from the Something-Something dataset. Our approach allows an agent to learn actions from single videos, while watching multiple demonstrations makes the policy more robust. We show that policies learned in a simulated environment can be easily transferred to a real robot.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes