Under the Hood of Transformer Networks for Trajectory Forecasting
This is an incremental study that provides a systematic analysis for researchers in trajectory forecasting, focusing on individual motion modeling.
The paper tackled the problem of individual human trajectory forecasting without social interactions or scene context, finding that Transformer Networks and Bidirectional Transformers outperform RNNs and LSTMs on the ETH+UCY benchmark and remain competitive with more complex methods.
Transformer Networks have established themselves as the de-facto state-of-the-art for trajectory forecasting but there is currently no systematic study on their capability to model the motion patterns of people, without interactions with other individuals nor the social context. This paper proposes the first in-depth study of Transformer Networks (TF) and Bidirectional Transformers (BERT) for the forecasting of the individual motion of people, without bells and whistles. We conduct an exhaustive evaluation of input/output representations, problem formulations and sequence modeling, including a novel analysis of their capability to predict multi-modal futures. Out of comparative evaluation on the ETH+UCY benchmark, both TF and BERT are top performers in predicting individual motions, definitely overcoming RNNs and LSTMs. Furthermore, they remain within a narrow margin wrt more complex techniques, which include both social interactions and scene contexts. Source code will be released for all conducted experiments.