GRCVJun 8, 2025

HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance

arXiv:2506.07209v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the problem of synthesizing realistic human-object interactions for applications like animation or robotics, though it is incremental by building on prior motion generation methods.

The paper tackles zero-shot generation of 4D human-object interactions from text prompts by using part-level affordance reasoning, resulting in improved realism and text alignment for complex sequences.

We present HOI-PAGE, a new approach to synthesizing 4D human-object interactions (HOIs) from text prompts in a zero-shot fashion, driven by part-level affordance reasoning. In contrast to prior works that focus on global, whole body-object motion for 4D HOI synthesis, we observe that generating realistic and diverse HOIs requires a finer-grained understanding -- at the level of how human body parts engage with object parts. We thus introduce Part Affordance Graphs (PAGs), a structured HOI representation distilled from large language models (LLMs) that encodes fine-grained part information along with contact relations. We then use these PAGs to guide a three-stage synthesis: first, decomposing input 3D objects into geometric parts; then, generating reference HOI videos from text prompts, from which we extract part-based motion constraints; finally, optimizing for 4D HOI motion sequences that not only mimic the reference dynamics but also satisfy part-level contact constraints. Extensive experiments show that our approach is flexible and capable of generating complex multi-object or multi-person interaction sequences, with significantly improved realism and text alignment for zero-shot 4D HOI generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes