Stochastic Multi-Person 3D Motion Forecasting
This addresses the need for realistic multi-person motion prediction in applications like robotics and animation, though it builds incrementally on existing generative models.
The paper tackles the problem of forecasting diverse and socially plausible 3D motions for multiple people, introducing a dual-level generative modeling framework that separately handles individual motion and social interactions, achieving state-of-the-art results on benchmarks like CMU-Mocap, MuPoTS-3D, and SoMoF.
This paper aims to deal with the ignored real-world complexities in prior work on human motion forecasting, emphasizing the social properties of multi-person motion, the diversity of motion and social interactions, and the complexity of articulated motion. To this end, we introduce a novel task of stochastic multi-person 3D motion forecasting. We propose a dual-level generative modeling framework that separately models independent individual motion at the local level and social interactions at the global level. Notably, this dual-level modeling mechanism can be achieved within a shared generative model, through introducing learnable latent codes that represent intents of future motion and switching the codes' modes of operation at different levels. Our framework is general; we instantiate it with different generative models, including generative adversarial networks and diffusion models, and various multi-person forecasting models. Extensive experiments on CMU-Mocap, MuPoTS-3D, and SoMoF benchmarks show that our approach produces diverse and accurate multi-person predictions, significantly outperforming the state of the art.