Towards Realistic Hand-Object Interaction with Gravity-Field Based Diffusion Bridge
This work solves the problem of generating realistic hand-object interactions for computer vision and robotics applications, but it is incremental as it builds on existing reconstruction and pose estimation methods.
The paper tackled the problem of realistic hand-object interaction by addressing interpenetration, gaps, and hand deformations, resulting in physically plausible interactions with improved performance on multiple datasets.
Existing reconstruction or hand-object pose estimation methods are capable of producing coarse interaction states. However, due to the complex and diverse geometry of both human hands and objects, these approaches often suffer from interpenetration or leave noticeable gaps in regions that are supposed to be in contact. Moreover, the surface of a real human hand undergoes non-negligible deformations during interaction, which are difficult to capture and represent with previous methods. To tackle these challenges, we formulate hand-object interaction as an attraction-driven process and propose a Gravity-Field Based Diffusion Bridge (GravityDB) to simulate interactions between a deformable hand surface and rigid objects. Our approach effectively resolves the aforementioned issues by generating physically plausible interactions that are free of interpenetration, ensure stable grasping, and capture realistic hand deformations. Furthermore, we incorporate semantic information from textual descriptions to guide the construction of the gravitational field, enabling more semantically meaningful interaction regions. Extensive qualitative and quantitative experiments on multiple datasets demonstrate the effectiveness of our method.