Hyper-STTN: Hypergraph Augmented Spatial-Temporal Transformer Network for Trajectory Prediction
This work addresses trajectory prediction for applications like autonomous driving and social robotics, representing an incremental improvement over existing methods.
The paper tackles the problem of predicting crowd trajectories by modeling both pairwise and groupwise interactions, and the proposed Hyper-STTN method achieves state-of-the-art performance on public pedestrian motion datasets.
Predicting crowd intentions and trajectories is critical for a range of real-world applications, involving social robotics and autonomous driving. Accurately modeling such behavior remains challenging due to the complexity of pairwise spatial-temporal interactions and the heterogeneous influence of groupwise dynamics. To address these challenges, we propose Hyper-STTN, a Hypergraph-based Spatial-Temporal Transformer Network for crowd trajectory prediction. Hyper-STTN constructs multiscale hypergraphs of varying group sizes to model groupwise correlations, captured through spectral hypergraph convolution based on random-walk probabilities. In parallel, a spatial-temporal transformer is employed to learn pedestrians' pairwise latent interactions across spatial and temporal dimensions. These heterogeneous groupwise and pairwise features are subsequently fused and aligned via a multimodal transformer. Extensive experiments on public pedestrian motion datasets demonstrate that Hyper-STTN consistently outperforms state-of-the-art baselines and ablation models.