CVLGDec 7, 2023

DiffusionPhase: Motion Diffusion in Frequency Domain

arXiv:2312.04036v114 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses the challenge of text-to-motion generation for applications like animation or robotics, but it is incremental as it builds on existing diffusion models with a novel domain shift.

The paper tackles the problem of generating high-quality human motion sequences from text descriptions by introducing a method in the frequency domain, resulting in outperforming current methods in motion diversity and smooth transitions for long sequences.

In this study, we introduce a learning-based method for generating high-quality human motion sequences from text descriptions (e.g., ``A person walks forward"). Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences, due to limited text-to-motion datasets and the pose representations used that often lack expressiveness or compactness. To address these issues, we propose the first method for text-conditioned human motion generation in the frequency domain of motions. We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space with high-frequency details encoded, capturing the local periodicity of motions in time and space with high accuracy. We also introduce a conditional diffusion model for predicting periodic motion parameters based on text descriptions and a start pose, efficiently achieving smooth transitions between motion sequences associated with different text descriptions. Experiments demonstrate that our approach outperforms current methods in generating a broader variety of high-quality motions, and synthesizing long sequences with natural transitions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes