ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation
This work addresses the challenge of general action-conditioned 3D human motion generation for applications in animation and robotics, representing an incremental advancement by combining existing techniques like Transformer and GAN with a new dataset.
The paper tackled the problem of generating 3D human motions conditioned on actions, including single-person and multi-person interactive actions, by proposing ActFormer, a GAN-based Transformer method, and introduced a new synthetic dataset for multi-person combat behaviors; it achieved superior performance over state-of-the-art methods on multiple datasets.
We present a GAN-based Transformer for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions. Our approach consists of a powerful Action-conditioned motion TransFormer (ActFormer) under a GAN training scheme, equipped with a Gaussian Process latent prior. Such a design combines the strong spatio-temporal representation capacity of Transformer, superiority in generative modeling of GAN, and inherent temporal correlations from the latent prior. Furthermore, ActFormer can be naturally extended to multi-person motions by alternately modeling temporal correlations and human interactions with Transformer encoders. To further facilitate research on multi-person motion generation, we introduce a new synthetic dataset of complex multi-person combat behaviors. Extensive experiments on NTU-13, NTU RGB+D 120, BABEL and the proposed combat dataset show that our method can adapt to various human motion representations and achieve superior performance over the state-of-the-art methods on both single-person and multi-person motion generation tasks, demonstrating a promising step towards a general human motion generator.