CVNov 24, 2025

ReAlign: Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

arXiv:2511.19217v16 citations
Originality Incremental advance
AI Analysis

This addresses a key limitation in generating realistic and semantically consistent 3D human motions for applications like gaming and robotics, representing an incremental advance.

The paper tackles the misalignment between text and motion distributions in diffusion models for text-to-motion generation, proposing ReAlign to improve alignment and quality, with experiments showing significant improvements over state-of-the-art methods.

Text-to-motion generation, which synthesizes 3D human motions from text inputs, holds immense potential for applications in gaming, film, and robotics. Recently, diffusion-based methods have been shown to generate more diversity and realistic motion. However, there exists a misalignment between text and motion distributions in diffusion models, which leads to semantically inconsistent or low-quality motions. To address this limitation, we propose Reward-guided sampling Alignment (ReAlign), comprising a step-aware reward model to assess alignment quality during the denoising sampling and a reward-guided strategy that directs the diffusion process toward an optimally aligned distribution. This reward model integrates step-aware tokens and combines a text-aligned module for semantic consistency and a motion-aligned module for realism, refining noisy motions at each timestep to balance probability density and alignment. Extensive experiments of both motion generation and retrieval tasks demonstrate that our approach significantly improves text-motion alignment and motion quality compared to existing state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes