MotionPCM: Real-Time Motion Synthesis with Phased Consistency Model
This addresses the problem of inefficient real-time motion generation for applications like animation or robotics, representing an incremental improvement over existing consistency models.
The paper tackled the challenge of real-time human motion synthesis by proposing MotionPCM, a phased consistency model that reduces sampling steps, achieving over 30 FPS in a single step and a 38.9% improvement in FID on the HumanML3D dataset.
Diffusion models have become a popular choice for human motion synthesis due to their powerful generative capabilities. However, their high computational complexity and large sampling steps pose challenges for real-time applications. Fortunately, the Consistency Model (CM) provides a solution to greatly reduce the number of sampling steps from hundreds to a few, typically fewer than four, significantly accelerating the synthesis of diffusion models. However, applying CM to text-conditioned human motion synthesis in latent space yields unsatisfactory generation results. In this paper, we introduce \textbf{MotionPCM}, a phased consistency model-based approach designed to improve the quality and efficiency for real-time motion synthesis in latent space. Experimental results on the HumanML3D dataset show that our model achieves real-time inference at over 30 frames per second in a single sampling step while outperforming the previous state-of-the-art with a 38.9\% improvement in FID. The code will be available for reproduction.