ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
This work addresses the challenge of generating biomechanically coherent human-object interactions for applications in animation and robotics, representing an incremental advance over existing implicit modeling approaches.
The paper tackles the problem of generating realistic human-object interactions (HOIs) from text by explicitly modeling interactions at both joint and kinetic chain levels, resulting in significant performance improvements over previous methods on two public datasets.
We propose ChainHOI, a novel approach for text-driven human-object interaction (HOI) generation that explicitly models interactions at both the joint and kinetic chain levels. Unlike existing methods that implicitly model interactions using full-body poses as tokens, we argue that explicitly modeling joint-level interactions is more natural and effective for generating realistic HOIs, as it directly captures the geometric and semantic relationships between joints, rather than modeling interactions in the latent pose space. To this end, ChainHOI introduces a novel joint graph to capture potential interactions with objects, and a Generative Spatiotemporal Graph Convolution Network to explicitly model interactions at the joint level. Furthermore, we propose a Kinematics-based Interaction Module that explicitly models interactions at the kinetic chain level, ensuring more realistic and biomechanically coherent motions. Evaluations on two public datasets demonstrate that ChainHOI significantly outperforms previous methods, generating more realistic, and semantically consistent HOIs. Code is available \href{https://github.com/qinghuannn/ChainHOI}{here}.