Joint-Relation Transformer for Multi-Person Motion Prediction
This work addresses the challenge of accurately predicting future motions in multi-person scenarios, which is crucial for applications like robotics and surveillance, by enhancing interaction modeling with relation-aware methods, though it is incremental as it builds on existing Transformer approaches.
The paper tackles the problem of multi-person motion prediction by incorporating explicit relation information like skeleton structure and pairwise distances into a Transformer model, resulting in improvements such as a 13.4% increase in 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% reduction in 3s MPJPE on CMU-Mpcap/MuPoTS-3D datasets.
Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people. Transformer-based methods have shown promising results on this task, but they miss the explicit relation representation between joints, such as skeleton structure and pairwise distance, which is crucial for accurate interaction modeling. In this paper, we propose the Joint-Relation Transformer, which utilizes relation information to enhance interaction modeling and improve future motion prediction. Our relation information contains the relative distance and the intra-/inter-person physical constraints. To fuse relation and joint information, we design a novel joint-relation fusion layer with relation-aware attention to update both features. Additionally, we supervise the relation information by forecasting future distance. Experiments show that our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE on CMU-Mpcap/MuPoTS-3D dataset.