CVDec 11, 2023

HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models

arXiv:2312.06553v392 citationsh-index: 332025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Originality Incremental advance
AI Analysis

This addresses the challenge of synthesizing diverse and coherent 3D human-object interactions for applications in animation, robotics, or virtual reality, representing an incremental advance in text-driven 3D generation.

The paper tackles the problem of generating realistic 3D human-object interactions from text prompts by decomposing the task into motion generation and affordance prediction using diffusion models, achieving realistic results on BEHAVE and OMOMO datasets.

We address the problem of generating realistic 3D human-object interactions (HOIs) driven by textual prompts. To this end, we take a modular design and decompose the complex task into simpler sub-tasks. We first develop a dual-branch diffusion model (HOI-DM) to generate both human and object motions conditioned on the input text, and encourage coherent motions by a cross-attention communication module between the human and object motion generation branches. We also develop an affordance prediction diffusion model (APDM) to predict the contacting area between the human and object during the interactions driven by the textual prompt. The APDM is independent of the results by the HOI-DM and thus can correct potential errors by the latter. Moreover, it stochastically generates the contacting points to diversify the generated motions. Finally, we incorporate the estimated contacting points into the classifier-guidance to achieve accurate and close contact between humans and objects. To train and evaluate our approach, we annotate BEHAVE dataset with text descriptions. Experimental results on BEHAVE and OMOMO demonstrate that our approach produces realistic HOIs with various interactions and different types of objects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes