CVMar 26, 2025

Guiding Human-Object Interactions with Rich Geometry and Relations

arXiv:2503.20172v118 citationsh-index: 4CVPR
Originality Incremental advance
AI Analysis

This work improves HOI synthesis for applications like virtual reality by enhancing geometric fidelity, though it appears incremental as it builds on existing diffusion-based methods.

The paper tackles the problem of generating realistic human-object interactions (HOI) by addressing limitations of simplified object representations, introducing ROG, a diffusion-based framework that uses boundary-focused key points and interactive distance fields to model geometric complexity, achieving state-of-the-art performance in realism and semantic accuracy.

Human-object interaction (HOI) synthesis is crucial for creating immersive and realistic experiences for applications such as virtual reality. Existing methods often rely on simplified object representations, such as the object's centroid or the nearest point to a human, to achieve physically plausible motions. However, these approaches may overlook geometric complexity, resulting in suboptimal interaction fidelity. To address this limitation, we introduce ROG, a novel diffusion-based framework that models the spatiotemporal relationships inherent in HOIs with rich geometric detail. For efficient object representation, we select boundary-focused and fine-detail key points from the object mesh, ensuring a comprehensive depiction of the object's geometry. This representation is used to construct an interactive distance field (IDF), capturing the robust HOI dynamics. Furthermore, we develop a diffusion-based relation model that integrates spatial and temporal attention mechanisms, enabling a better understanding of intricate HOI relationships. This relation model refines the generated motion's IDF, guiding the motion generation process to produce relation-aware and semantically aligned movements. Experimental evaluations demonstrate that ROG significantly outperforms state-of-the-art methods in the realism and semantic accuracy of synthesized HOIs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes