Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models
This addresses the problem of generating varied and realistic human motions for applications like animation or robotics, though it appears incremental as it applies an existing diffusion method to a new task.
The paper tackled realistic and diverse 3D skeleton-based motion generation conditioned on actions using a denoising diffusion probabilistic model, showing improvements over state-of-the-art methods on the NTU RGB+D dataset.
Diffusion-based generative models have recently emerged as powerful solutions for high-quality synthesis in multiple domains. Leveraging the bidirectional Markov chains, diffusion probabilistic models generate samples by inferring the reversed Markov chain based on the learned distribution mapping at the forward diffusion process. In this work, we propose Modiff, a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation. We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action. We evaluate our approach on the large-scale NTU RGB+D dataset and show improvements over state-of-the-art motion generation methods.