Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
This addresses a challenge in computer vision and graphics for applications like animation and robotics, but it is incremental as it builds on diffusion models with a hierarchical approach.
The paper tackles the problem of generating long-range and diverse 3D human-object interaction motions, which existing methods struggled with, and shows that their hierarchical framework outperforms previous methods by a large margin in quality and diversity on datasets like NSM, COUCH, and SAMP.
This paper presents a novel approach to generating the 3D motion of a human interacting with a target object, with a focus on solving the challenge of synthesizing long-range and diverse motions, which could not be fulfilled by existing auto-regressive models or path planning-based methods. We propose a hierarchical generation framework to solve this challenge. Specifically, our framework first generates a set of milestones and then synthesizes the motion along them. Therefore, the long-range motion generation could be reduced to synthesizing several short motion sequences guided by milestones. The experiments on the NSM, COUCH, and SAMP datasets show that our approach outperforms previous methods by a large margin in both quality and diversity. The source code is available on our project page https://zju3dv.github.io/hghoi.