ROAICVAug 19, 2025

MimicFunc: Imitating Tool Manipulation from a Single Human Video via Functional Correspondence

arXiv:2508.13534v110 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of scalable robot skill learning from human demonstrations, reducing reliance on labor-intensive data collection, though it is incremental in improving generalization over existing methods.

The paper tackles the problem of enabling robots to imitate tool manipulation from a single human video by establishing functional correspondences to handle geometric variations among tools, achieving generalization to novel tools for equivalent tasks without requiring teleoperation data.

Imitating tool manipulation from human videos offers an intuitive approach to teaching robots, while also providing a promising and scalable alternative to labor-intensive teleoperation data collection for visuomotor policy learning. While humans can mimic tool manipulation behavior by observing others perform a task just once and effortlessly transfer the skill to diverse tools for functionally equivalent tasks, current robots struggle to achieve this level of generalization. A key challenge lies in establishing function-level correspondences, considering the significant geometric variations among functionally similar tools, referred to as intra-function variations. To address this challenge, we propose MimicFunc, a framework that establishes functional correspondences with function frame, a function-centric local coordinate frame constructed with keypoint-based abstraction, for imitating tool manipulation skills. Experiments demonstrate that MimicFunc effectively enables the robot to generalize the skill from a single RGB-D human video to manipulating novel tools for functionally equivalent tasks. Furthermore, leveraging MimicFunc's one-shot generalization capability, the generated rollouts can be used to train visuomotor policies without requiring labor-intensive teleoperation data collection for novel objects. Our code and video are available at https://sites.google.com/view/mimicfunc.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes