ROAICVSep 12, 2024

Hand-Object Interaction Pretraining from Videos

arXiv:2409.08273v147 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses the challenge of sample-efficient robot manipulation learning for robotics applications, representing an incremental advance by building on existing video-based and retargeting techniques.

The paper tackles the problem of learning robot manipulation priors from 3D hand-object interaction videos by retargeting human motions to robot actions, resulting in a task-agnostic base policy that improves sample efficiency, robustness, and generalizability in downstream tasks compared to prior methods.

We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic base policy. This policy captures a general yet flexible manipulation prior. We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches. Qualitative experiments are available at: \url{https://hgaurav2k.github.io/hop/}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes