ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
This work addresses the challenge of accurate human-object interaction reconstruction for computer vision applications, representing an incremental improvement with specific gains in physical plausibility.
The paper tackles the problem of reconstructing physically plausible human-object interactions from images by introducing ScoreHOI, a diffusion-based optimizer that uses score-guided sampling and physical constraints, achieving superior performance over state-of-the-art methods on standard benchmarks.
Joint reconstruction of human-object interaction marks a significant milestone in comprehending the intricate interrelations between humans and their surrounding environment. Nevertheless, previous optimization methods often struggle to achieve physically plausible reconstruction results due to the lack of prior knowledge about human-object interactions. In this paper, we introduce ScoreHOI, an effective diffusion-based optimizer that introduces diffusion priors for the precise recovery of human-object interactions. By harnessing the controllability within score-guided sampling, the diffusion model can reconstruct a conditional distribution of human and object pose given the image observation and object feature. During inference, the ScoreHOI effectively improves the reconstruction results by guiding the denoising process with specific physical constraints. Furthermore, we propose a contact-driven iterative refinement approach to enhance the contact plausibility and improve the reconstruction accuracy. Extensive evaluations on standard benchmarks demonstrate ScoreHOI's superior performance over state-of-the-art methods, highlighting its ability to achieve a precise and robust improvement in joint human-object interaction reconstruction.