CVMar 26, 2024

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

arXiv:2403.18036v196 citationsh-index: 25CVPR
Originality Incremental advance
AI Analysis

This addresses the problem of language-guided human motion generation in 3D environments for applications in robotics and virtual reality, representing a novel method for a known bottleneck.

The paper tackles generating human motion from language in 3D scenes by introducing a two-stage framework using scene affordance as an intermediate representation, achieving consistent outperformance over baselines on benchmarks like HumanML3D and HUMANISE.

Despite significant advancements in text-to-motion synthesis, generating language-guided human motion within 3D environments poses substantial challenges. These challenges stem primarily from (i) the absence of powerful generative models capable of jointly modeling natural language, 3D scenes, and human motion, and (ii) the generative models' intensive data requirements contrasted with the scarcity of comprehensive, high-quality, language-scene-motion datasets. To tackle these issues, we introduce a novel two-stage framework that employs scene affordance as an intermediate representation, effectively linking 3D scene grounding and conditional motion generation. Our framework comprises an Affordance Diffusion Model (ADM) for predicting explicit affordance map and an Affordance-to-Motion Diffusion Model (AMDM) for generating plausible human motions. By leveraging scene affordance maps, our method overcomes the difficulty in generating human motion under multimodal condition signals, especially when training with limited data lacking extensive language-scene-motion pairs. Our extensive experiments demonstrate that our approach consistently outperforms all baselines on established benchmarks, including HumanML3D and HUMANISE. Additionally, we validate our model's exceptional generalization capabilities on a specially curated evaluation set featuring previously unseen descriptions and scenes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes