InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
This work addresses the challenge of generating realistic two-hand interactions for applications in computer vision and graphics, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackled the problem of generating plausible and diverse two-hand interactions, proposing InterHandGen, a framework that decomposes joint distribution modeling into factored single-hand distributions using a diffusion model with conditioning dropout, and demonstrated significant outperformance over baselines in plausibility and diversity, while also boosting two-hand reconstruction accuracy to state-of-the-art levels.
We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction. Sampling from our model yields plausible and diverse two-hand shapes in close interaction with or without an object. Our prior can be incorporated into any optimization or learning methods to reduce ambiguity in an ill-posed setup. Our key observation is that directly modeling the joint distribution of multiple instances imposes high learning complexity due to its combinatorial nature. Thus, we propose to decompose the modeling of joint distribution into the modeling of factored unconditional and conditional single instance distribution. In particular, we introduce a diffusion model that learns the single-hand distribution unconditional and conditional to another hand via conditioning dropout. For sampling, we combine anti-penetration and classifier-free guidance to enable plausible generation. Furthermore, we establish the rigorous evaluation protocol of two-hand synthesis, where our method significantly outperforms baseline generative models in terms of plausibility and diversity. We also demonstrate that our diffusion prior can boost the performance of two-hand reconstruction from monocular in-the-wild images, achieving new state-of-the-art accuracy.