CVLGApr 3, 2023

Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

arXiv:2304.04681v13 citationsh-index: 39
Originality Incremental advance
AI Analysis

This work addresses challenges in human motion synthesis for applications like interactive media and social robotics, representing an incremental improvement through a novel hybrid approach.

The paper tackles the problem of generating diverse, controllable human motions from imperfect input poses by introducing MoDiff, an autoregressive diffusion model that integrates cross-modal Transformers and a novel diffusion-based data dropout method. The model demonstrates superior performance in locomotion synthesis compared to two baselines and achieves robust, high-fidelity motion reconstruction close to recorded data.

Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes