CVRONov 4, 2024

Multi-Transmotion: Pre-trained Model for Human Motion Prediction

arXiv:2411.02673v124 citationsh-index: 7Has CodeCoRL
Originality Incremental advance
AI Analysis

This work addresses the problem of human motion prediction for applications in autonomous vehicles and social robotics, but it is incremental as it builds on existing datasets and methods.

The paper tackles the lack of a standardized dataset for human motion prediction by integrating seven datasets across modalities to propose a pre-trained transformer-based model, achieving competitive performance on downstream tasks like trajectory and pose prediction.

The ability of intelligent systems to predict human behaviors is crucial, particularly in fields such as autonomous vehicle navigation and social robotics. However, the complexity of human motion have prevented the development of a standardized dataset for human motion prediction, thereby hindering the establishment of pre-trained models. In this paper, we address these limitations by integrating multiple datasets, encompassing both trajectory and 3D pose keypoints, to propose a pre-trained model for human motion prediction. We merge seven distinct datasets across varying modalities and standardize their formats. To facilitate multimodal pre-training, we introduce Multi-Transmotion, an innovative transformer-based model designed for cross-modality pre-training. Additionally, we present a novel masking strategy to capture rich representations. Our methodology demonstrates competitive performance across various datasets on several downstream tasks, including trajectory prediction in the NBA and JTA datasets, as well as pose prediction in the AMASS and 3DPW datasets. The code is publicly available: https://github.com/vita-epfl/multi-transmotion

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes