CVDec 17, 2024

Move-in-2D: 2D-Conditioned Human Motion Generation

arXiv:2412.13185v17 citationsh-index: 7CVPR
Originality Incremental advance
AI Analysis

This addresses the problem of generating diverse, scene-adaptive human motion for video synthesis, though it is incremental as it builds on existing motion generation methods.

The paper tackles generating human motion sequences conditioned on scene images and text prompts, using a diffusion model trained on a large-scale annotated video dataset, resulting in motion that aligns with scenes and improves video synthesis quality.

Generating realistic human videos remains a challenging task, with the most effective methods currently relying on a human motion sequence as a control signal. Existing approaches often use existing motion extracted from other videos, which restricts applications to specific motion types and global scene matching. We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image, allowing for diverse motion that adapts to different scenes. Our approach utilizes a diffusion model that accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene. To train this model, we collect a large-scale video dataset featuring single-human activities, annotating each video with the corresponding human motion as the target output. Experiments demonstrate that our method effectively predicts human motion that aligns with the scene image after projection. Furthermore, we show that the generated motion sequence improves human motion quality in video synthesis tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes