CVApr 27

MARRS: Masked Autoregressive Unit-based Reaction Synthesis

arXiv:2505.1133462.91 citationsh-index: 10
Predicted impact top 38% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenging task of generating coordinated human reactions for interactive motion synthesis, which is important for applications like virtual reality and human-robot interaction.

MARRS tackles human action-reaction synthesis, generating reaction motions conditioned on another person's actions. It achieves superior performance over existing methods in both quantitative and qualitative evaluations.

This work aims at a challenging task: human action-reaction synthesis, i.e., generating human reactions conditioned on the action sequence of another person. Currently, autoregressive modeling approaches with vector quantization (VQ) have achieved remarkable performance in motion generation tasks. However, VQ has inherent disadvantages, including quantization information loss, low codebook utilization, etc. In addition, while dividing the body into separate units can be beneficial, the computational complexity needs to be considered. Also, the importance of mutual perception among units is often neglected. In this work, we propose MARRS, a novel framework designed to generate coordinated and fine-grained reaction motions using continuous representations. Initially, we present the Unit-distinguished Motion Variational AutoEncoder (UD-VAE), which segments the entire body into distinct body and hand units, encoding each independently. Subsequently, we propose Action-Conditioned Fusion (ACF), which involves randomly masking a subset of reactive tokens and extracting specific information about the body and hands from the active tokens. Furthermore, we introduce Mutual Unit Modulation (MUM) to facilitate interaction between body and hand units by using the information from one unit to adaptively modulate the other. Finally, for the diffusion model, we employ a compact MLP as a noise predictor for each distinct body unit and incorporate the diffusion loss to model the probability distribution of each token. Both quantitative and qualitative results demonstrate that our method achieves superior performance. Project page: https://aigc-explorer.github.io/MARRS/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes