Multi-Head Attention for Multi-Modal Joint Vehicle Motion Forecasting
This addresses the problem of predicting uncertain and interactive vehicle motions for autonomous driving systems, representing an incremental improvement over existing methods.
The paper tackles vehicle motion forecasting by developing a method that produces joint multi-modal probability forecasts for all vehicles in a scene using multi-head attention and LSTM layers, achieving higher prediction likelihood than state-of-the-art models on the same dataset.
This paper presents a novel vehicle motion forecasting method based on multi-head attention. It produces joint forecasts for all vehicles on a road scene as sequences of multi-modal probability density functions of their positions. Its architecture uses multi-head attention to account for complete interactions between all vehicles, and long short-term memory layers for encoding and forecasting. It relies solely on vehicle position tracks, does not need maneuver definitions, and does not represent the scene with a spatial grid. This allows it to be more versatile than similar model while combining any forecasting capabilities, namely joint forecast with interactions, uncertainty estimation, and multi-modality. The resulting prediction likelihood outperforms state-of-the-art models on the same dataset.