CVFeb 8, 2025

Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation

arXiv:2502.05534v146 citationsh-index: 5Int J Comput Vis
Originality Incremental advance
AI Analysis

This work improves motion generation for applications like animation and robotics by providing more accurate text-to-motion alignment, though it is incremental as it builds on existing methods with novel components.

The paper tackles the problem of fine-grained text-driven human motion generation by addressing imprecise motions due to ineffective text parsing and incomplete modeling of linguistic structures, resulting in a framework that outperforms state-of-the-art methods on HumanML3D and KIT-ML datasets.

We address the challenging problem of fine-grained text-driven human motion generation. Existing works generate imprecise motions that fail to accurately capture relationships specified in text due to: (1) lack of effective text parsing for detailed semantic cues regarding body parts, (2) not fully modeling linguistic structures between words to comprehend text comprehensively. To tackle these limitations, we propose a novel fine-grained framework Fg-T2M++ that consists of: (1) an LLMs semantic parsing module to extract body part descriptions and semantics from text, (2) a hyperbolic text representation module to encode relational information between text units by embedding the syntactic dependency graph into hyperbolic space, and (3) a multi-modal fusion module to hierarchically fuse text and motion features. Extensive experiments on HumanML3D and KIT-ML datasets demonstrate that Fg-T2M++ outperforms SOTA methods, validating its ability to accurately generate motions adhering to comprehensive text semantics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes