SimpliHuMoN: Simplifying Human Motion Prediction
This work provides a unified and high-performing solution for human motion prediction, benefiting applications in robotics, animation, and virtual reality.
This paper tackles the problem of holistic human motion prediction, which combines trajectory forecasting and human pose prediction. The proposed transformer-based model achieves state-of-the-art results across pose-only, trajectory-only, and combined prediction tasks on various benchmark datasets.
Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been developed. Combining these models for holistic human motion prediction is non-trivial, and recent methods have struggled to compete on established benchmarks for individual tasks. To address this, we propose a simple yet effective transformer-based model for human motion prediction. The model employs a stack of self-attention modules to effectively capture both spatial dependencies within a pose and temporal relationships across a motion sequence. This simple, streamlined, end-to-end model is sufficiently versatile to handle pose-only, trajectory-only, and combined prediction tasks without task-specific modifications. We demonstrate that this approach achieves state-of-the-art results across all tasks through extensive experiments on a wide range of benchmark datasets, including Human3.6M, AMASS, ETH-UCY, and 3DPW.