CVApr 18, 2024

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani

arXiv:2404.12383v122.743 citationsh-index: 45CVPR

Originality Highly original

AI Analysis

This work addresses the challenge of generating realistic 3D hand and object interactions for applications in robotics and computer vision, representing a first approach to jointly generating both components.

The paper tackled the problem of jointly modeling 3D hand-object interactions by proposing G-HOP, a denoising diffusion-based generative prior, which outperformed task-specific baselines in video-based reconstruction and human grasp synthesis.

We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution, we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field for the object. We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis. We believe that our model, trained by aggregating seven diverse real-world interaction datasets spanning across 155 categories, represents a first approach that allows jointly generating both hand and object. Our empirical evaluations demonstrate the benefit of this joint prior in video-based reconstruction and human grasp synthesis, outperforming current task-specific baselines. Project website: https://judyye.github.io/ghop-www

View on arXiv PDF

Similar