ROCVLGJun 3, 2025

Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schrödinger Bridges

Princeton
arXiv:2506.02489v21 citationsh-index: 7
AI Analysis

This work enables semantic grasp transfer for heterogeneous manipulators, bridging vision-based grasping with probabilistic generative modeling, but it appears incremental as it builds on existing methods like Schrödinger Bridges.

The authors tackled the problem of transferring grasp intent across robotic hands with different morphologies using visual observations, achieving stable and physically grounded grasps with strong generalization in experiments.

We propose a new approach to vision-based dexterous grasp translation, which aims to transfer grasp intent across robotic hands with differing morphologies. Given a visual observation of a source hand grasping an object, our goal is to synthesize a functionally equivalent grasp for a target hand without requiring paired demonstrations or hand-specific simulations. We frame this problem as a stochastic transport between grasp distributions using the Schrödinger Bridge formalism. Our method learns to map between source and target latent grasp spaces via score and flow matching, conditioned on visual observations. To guide this translation, we introduce physics-informed cost functions that encode alignment in base pose, contact maps, wrench space, and manipulability. Experiments across diverse hand-object pairs demonstrate our approach generates stable, physically grounded grasps with strong generalization. This work enables semantic grasp transfer for heterogeneous manipulators and bridges vision-based grasping with probabilistic generative modeling. Additional details at https://grasp2grasp.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes