ROAIFeb 12, 2025

CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World

arXiv:2502.08449v217 citationsh-index: 17Robotics
Originality Incremental advance
AI Analysis

This work solves the challenge of fine-grained dexterous manipulation for robotics, representing an incremental improvement over existing 3D-based imitation learning methods.

The paper tackles the problem of achieving human-level dexterity in robots by addressing limitations in 3D representations for manipulation, proposing CordViP, which constructs correspondences between objects and hands using 6D pose estimation and proprioception, resulting in state-of-the-art performance in six real-world tasks with superior generalization and robustness.

Achieving human-level dexterity in robots is a key objective in the field of robotic manipulation. Recent advancements in 3D-based imitation learning have shown promising results, providing an effective pathway to achieve this goal. However, obtaining high-quality 3D representations presents two key problems: (1) the quality of point clouds captured by a single-view camera is significantly affected by factors such as camera resolution, positioning, and occlusions caused by the dexterous hand; (2) the global point clouds lack crucial contact information and spatial correspondences, which are necessary for fine-grained dexterous manipulation tasks. To eliminate these limitations, we propose CordViP, a novel framework that constructs and learns correspondences by leveraging the robust 6D pose estimation of objects and robot proprioception. Specifically, we first introduce the interaction-aware point clouds, which establish correspondences between the object and the hand. These point clouds are then used for our pre-training policy, where we also incorporate object-centric contact maps and hand-arm coordination information, effectively capturing both spatial and temporal dynamics. Our method demonstrates exceptional dexterous manipulation capabilities, achieving state-of-the-art performance in six real-world tasks, surpassing other baselines by a large margin. Experimental results also highlight the superior generalization and robustness of CordViP to different objects, viewpoints, and scenarios. Code and videos are available on https://aureleopku.github.io/CordViP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes