ROAILGSep 10, 2024

One-Shot Imitation under Mismatched Execution

arXiv:2409.06615v611 citationsh-index: 9
Originality Highly original
AI Analysis

This addresses the challenge of scalable robot programming from human demonstrations without paired data, which is incremental but impactful for robotics.

The paper tackles the problem of translating human demonstrations into robot-executable actions despite mismatches in movement styles and physical capabilities, achieving over 50% increase in task success compared to previous methods.

Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods for human-robot translation either depend on paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically pairs human and robot trajectories using sequence-level optimal transport cost functions. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50% increase in task success compared to previous methods. We release our code and datasets at https://portal-cornell.github.io/rhyme/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes