CVFeb 9, 2023

Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors

arXiv:2302.04860v119.862 citationsh-index: 33Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of generating varied and accurate human motions for applications like animation and robotics, though it is incremental as it builds on existing motion predictors.

The paper tackles the problem of diverse human motion prediction by introducing spatial-temporal anchors to prevent mode collapse, achieving state-of-the-art performance in both stochastic and deterministic prediction.

Predicting diverse human motions given a sequence of historical poses has received increasing attention. Despite rapid progress, existing work captures the multi-modal nature of human motions primarily through likelihood-based sampling, where the mode collapse has been widely observed. In this paper, we propose a simple yet effective approach that disentangles randomly sampled codes with a deterministic learnable component named anchors to promote sample precision and diversity. Anchors are further factorized into spatial anchors and temporal anchors, which provide attractively interpretable control over spatial-temporal disparity. In principle, our spatial-temporal anchor-based sampling (STARS) can be applied to different motion predictors. Here we propose an interaction-enhanced spatial-temporal graph convolutional network (IE-STGCN) that encodes prior knowledge of human motions (e.g., spatial locality), and incorporate the anchors into it. Extensive experiments demonstrate that our approach outperforms state of the art in both stochastic and deterministic prediction, suggesting it as a unified framework for modeling human motions. Our code and pretrained models are available at https://github.com/Sirui-Xu/STARS.

View on arXiv PDF Code

Similar