CVOct 3, 2025

MoGIC: Boosting Motion Generation via Intention Understanding and Visual Context

arXiv:2510.02722v1h-index: 2Has Code
Originality Highly original
AI Analysis

This work addresses limitations in motion synthesis for applications like animation and robotics by enhancing controllability and understanding, though it is incremental as it builds on existing multimodal methods.

The paper tackles the problem of text-driven motion generation by proposing MoGIC, a framework that integrates intention modeling and visual priors to improve precision and personalization, resulting in a 38.6% reduction in FID on HumanML3D and 34.6% on Mo440H.

Existing text-driven motion generation methods often treat synthesis as a bidirectional mapping between language and motion, but remain limited in capturing the causal logic of action execution and the human intentions that drive behavior. The absence of visual grounding further restricts precision and personalization, as language alone cannot specify fine-grained spatiotemporal details. We propose MoGIC, a unified framework that integrates intention modeling and visual priors into multimodal motion synthesis. By jointly optimizing multimodal-conditioned motion generation and intention prediction, MoGIC uncovers latent human goals, leverages visual priors to enhance generation, and exhibits versatile multimodal generative capability. We further introduce a mixture-of-attention mechanism with adaptive scope to enable effective local alignment between conditional tokens and motion subsequences. To support this paradigm, we curate Mo440H, a 440-hour benchmark from 21 high-quality motion datasets. Experiments show that after finetuning, MoGIC reduces FID by 38.6\% on HumanML3D and 34.6\% on Mo440H, surpasses LLM-based methods in motion captioning with a lightweight text head, and further enables intention prediction and vision-conditioned generation, advancing controllable motion synthesis and intention understanding. The code is available at https://github.com/JunyuShi02/MoGIC

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes