AdvMT: Adversarial Motion Transformer for Long-term Human Motion Prediction
This work addresses the problem of enabling seamless human-robot collaboration by improving motion prediction, though it appears incremental as it builds on existing transformer and adversarial methods.
The paper tackles the challenge of accurate long-term human motion prediction by introducing AdvMT, a model that integrates a transformer-based encoder and adversarial training, resulting in significant improvements in prediction accuracy for both long-term and short-term scenarios.
To achieve seamless collaboration between robots and humans in a shared environment, accurately predicting future human movements is essential. Human motion prediction has traditionally been approached as a sequence prediction problem, leveraging historical human motion data to estimate future poses. Beginning with vanilla recurrent networks, the research community has investigated a variety of methods for learning human motion dynamics, encompassing graph-based and generative approaches. Despite these efforts, achieving accurate long-term predictions continues to be a significant challenge. In this regard, we present the Adversarial Motion Transformer (AdvMT), a novel model that integrates a transformer-based motion encoder and a temporal continuity discriminator. This combination effectively captures spatial and temporal dependencies simultaneously within frames. With adversarial training, our method effectively reduces the unwanted artifacts in predictions, thereby ensuring the learning of more realistic and fluid human motions. The evaluation results indicate that AdvMT greatly enhances the accuracy of long-term predictions while also delivering robust short-term predictions