CVOct 17, 2024

GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction

arXiv:2410.13911v27 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of creating realistic human-object interaction images for applications like virtual reality or animation, though it is incremental as it builds on prior generative models.

The paper tackles the problem of generating realistic images of humans interacting with objects using their hands, which existing models often fail at, and proposes GraspDiffusion to synthesize whole-body hand-object interaction scenes, outperforming previous methods.

Recent generative models can synthesize high-quality images but often fail to generate humans interacting with objects using their hands. This arises mostly from the model's misunderstanding of such interactions, and the hardships of synthesizing intricate regions of the body. In this paper, we propose GraspDiffusion, a novel generative method that creates realistic scenes of human-object interaction. Given a 3D object mesh, GraspDiffusion first constructs life-like whole-body poses with control over the object's location relative to the human body. This is achieved by separately leveraging the generative priors for 3D body and hand poses, optimizing them into a joint grasping pose. The resulting pose guides the image synthesis to correctly reflect the intended interaction, allowing the creation of realistic and diverse human-object interaction scenes. We demonstrate that GraspDiffusion can successfully tackle the relatively uninvestigated problem of generating full-bodied human-object interactions while outperforming previous methods. Code and models will be available at https://webtoon.github.io/GraspDiffusion

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes