CVDec 15, 2024

Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation

arXiv:2412.11193v125 citationsh-index: 8Has CodeAAAI
Originality Incremental advance
AI Analysis

This work addresses the high usage costs for applications requiring efficient text-to-motion generation, though it is incremental as it builds on existing methods with optimizations.

The paper tackles the problem of high parameter counts and slow inference in text-to-motion generation by proposing Light-T2M, a lightweight model that reduces parameters to 10% (4.48M vs. 44.85M) and achieves 16% faster inference (0.152s vs. 0.180s) while improving FID scores on benchmark datasets.

Despite the significant role text-to-motion (T2M) generation plays across various applications, current methods involve a large number of parameters and suffer from slow inference speeds, leading to high usage costs. To address this, we aim to design a lightweight model to reduce usage costs. First, unlike existing works that focus solely on global information modeling, we recognize the importance of local information modeling in the T2M task by reconsidering the intrinsic properties of human motion, leading us to propose a lightweight Local Information Modeling Module. Second, we introduce Mamba to the T2M task, reducing the number of parameters and GPU memory demands, and we have designed a novel Pseudo-bidirectional Scan to replicate the effects of a bidirectional scan without increasing parameter count. Moreover, we propose a novel Adaptive Textual Information Injector that more effectively integrates textual information into the motion during generation. By integrating the aforementioned designs, we propose a lightweight and fast model named Light-T2M. Compared to the state-of-the-art method, MoMask, our Light-T2M model features just 10\% of the parameters (4.48M vs 44.85M) and achieves a 16\% faster inference time (0.152s vs 0.180s), while surpassing MoMask with an FID of \textbf{0.040} (vs. 0.045) on HumanML3D dataset and 0.161 (vs. 0.228) on KIT-ML dataset. The code is available at https://github.com/qinghuannn/light-t2m.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes