CVDec 17, 2024

Move-in-2D: 2D-Conditioned Human Motion Generation

Hsin-Ping Huang, Yang Zhou, Jui-Hsien Wang, Difan Liu, Feng Liu, Ming-Hsuan Yang, Zhan Xu

arXiv:2412.13185v19.67 citationsh-index: 7CVPR

Originality Incremental advance

AI Analysis

This addresses the problem of generating diverse, scene-adaptive human motion for video synthesis, though it is incremental as it builds on existing motion generation methods.

The paper tackles generating human motion sequences conditioned on scene images and text prompts, using a diffusion model trained on a large-scale annotated video dataset, resulting in motion that aligns with scenes and improves video synthesis quality.

Generating realistic human videos remains a challenging task, with the most effective methods currently relying on a human motion sequence as a control signal. Existing approaches often use existing motion extracted from other videos, which restricts applications to specific motion types and global scene matching. We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image, allowing for diverse motion that adapts to different scenes. Our approach utilizes a diffusion model that accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene. To train this model, we collect a large-scale video dataset featuring single-human activities, annotating each video with the corresponding human motion as the target output. Experiments demonstrate that our method effectively predicts human motion that aligns with the scene image after projection. Furthermore, we show that the generated motion sequence improves human motion quality in video synthesis tasks.

View on arXiv PDF

Similar