CVApr 4, 2025

Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions

arXiv:2504.03639v19 citationsh-index: 19CVPR
Originality Incremental advance
AI Analysis

This work addresses a gap in text-to-motion generation for applications requiring realistic human animations, but it is incremental as it builds on existing methods like FSQ-VAE and pretrained language models.

The paper tackles the problem of generating human motions from text prompts while accounting for body shape influences, which are often overlooked, and demonstrates its method's efficacy through quantitative, qualitative, and perceptual evaluations.

We explore how body shapes influence human motion synthesis, an aspect often overlooked in existing text-to-motion generation methods due to the ease of learning a homogenized, canonical body shape. However, this homogenization can distort the natural correlations between different body shapes and their motion dynamics. Our method addresses this gap by generating body-shape-aware human motions from natural language prompts. We utilize a finite scalar quantization-based variational autoencoder (FSQ-VAE) to quantize motion into discrete tokens and then leverage continuous body shape information to de-quantize these tokens back into continuous, detailed motion. Additionally, we harness the capabilities of a pretrained language model to predict both continuous shape parameters and motion tokens, facilitating the synthesis of text-aligned motions and decoding them into shape-aware motions. We evaluate our method quantitatively and qualitatively, and also conduct a comprehensive perceptual study to demonstrate its efficacy in generating shape-aware motions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes