Beyond In-Distribution Performance: A Cross-Dataset Study of Trajectory Prediction Robustness
This work addresses robustness issues in trajectory prediction for autonomous vehicles, but it is incremental as it compares existing models without introducing new methods.
The study evaluated the out-of-distribution generalization of three state-of-the-art trajectory prediction models by training on Argoverse 2 and testing on Waymo Open Motion datasets, finding that the smallest model with the highest inductive bias performed best, especially when trained on the smaller dataset and tested on the larger one, while all models generalized poorly in the reverse setting.
We study the Out-of-Distribution (OoD) generalization ability of three SotA trajectory prediction models with comparable In-Distribution (ID) performance but different model designs. We investigate the influence of inductive bias, size of training data and data augmentation strategy by training the models on Argoverse 2 (A2) and testing on Waymo Open Motion (WO) and vice versa. We find that the smallest model with highest inductive bias exhibits the best OoD generalization across different augmentation strategies when trained on the smaller A2 dataset and tested on the large WO dataset. In the converse setting, training all models on the larger WO dataset and testing on the smaller A2 dataset, we find that all models generalize poorly, even though the model with the highest inductive bias still exhibits the best generalization ability. We discuss possible reasons for this surprising finding and draw conclusions about the design and test of trajectory prediction models and benchmarks.