CVOct 29, 2024

MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding

arXiv:2410.21747v152 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses the need for more flexible and general-purpose motion generation models for digital humans, though it appears incremental by building on existing large language models and quantization techniques.

The paper tackles the problem of generating lifelike human motions from text and other multimodal inputs by introducing MotionGPT-2, a unified large motion-language model that integrates text and pose tokens into a language model framework, achieving adaptability across tasks like motion generation, captioning, and completion.

Generating lifelike human motions from descriptive texts has experienced remarkable research focus in the recent years, propelled by the emerging requirements of digital humans.Despite impressive advances, existing approaches are often constrained by limited control modalities, task specificity, and focus solely on body motion representations.In this paper, we present MotionGPT-2, a unified Large Motion-Language Model (LMLM) that addresses these limitations. MotionGPT-2 accommodates multiple motion-relevant tasks and supporting multimodal control conditions through pre-trained Large Language Models (LLMs). It quantizes multimodal inputs-such as text and single-frame poses-into discrete, LLM-interpretable tokens, seamlessly integrating them into the LLM's vocabulary. These tokens are then organized into unified prompts, guiding the LLM to generate motion outputs through a pretraining-then-finetuning paradigm. We also show that the proposed MotionGPT-2 is highly adaptable to the challenging 3D holistic motion generation task, enabled by the innovative motion discretization framework, Part-Aware VQVAE, which ensures fine-grained representations of body and hand movements. Extensive experiments and visualizations validate the effectiveness of our method, demonstrating the adaptability of MotionGPT-2 across motion generation, motion captioning, and generalized motion completion tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes