CVMMSep 18, 2024

MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion

arXiv:2409.12140v29 citationsh-index: 18
Originality Synthesis-oriented
AI Analysis

This addresses motion generation for unseen text descriptions in computer graphics or animation, representing an incremental advance in retrieval-augmented methods for this domain.

The paper tackles text-based human motion generation by introducing MoRAG, a multi-fusion retrieval-augmented generation strategy that enhances motion diffusion models through improved motion retrieval and prompting of large language models, demonstrating it as a plug-and-play module that improves performance.

We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. By effectively prompting large language models (LLMs), we address spelling errors and rephrasing issues in motion retrieval. Our approach utilizes a multi-part retrieval strategy to improve the generalizability of motion retrieval across the language space. We create diverse samples through the spatial composition of the retrieved motions. Furthermore, by utilizing low-level, part-specific motion information, we can construct motion samples for unseen text descriptions. Our experiments demonstrate that our framework can serve as a plug-and-play module, improving the performance of motion diffusion models. Code, pretrained models and sample videos are available at: https://motion-rag.github.io/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes