Decoder Fusion RNN: Context and Interaction Aware Decoders for Trajectory Prediction
This addresses the challenge of safe autonomous driving by improving trajectory prediction, though it is incremental as it builds on existing attention-based methods.
The paper tackles the problem of forecasting future trajectories of traffic agents for autonomous driving by proposing Decoder Fusion RNN (DF-RNN), which achieves state-of-the-art performance on the Argoverse motion forecasting dataset.
Forecasting the future behavior of all traffic agents in the vicinity is a key task to achieve safe and reliable autonomous driving systems. It is a challenging problem as agents adjust their behavior depending on their intentions, the others' actions, and the road layout. In this paper, we propose Decoder Fusion RNN (DF-RNN), a recurrent, attention-based approach for motion forecasting. Our network is composed of a recurrent behavior encoder, an inter-agent multi-headed attention module, and a context-aware decoder. We design a map encoder that embeds polyline segments, combines them to create a graph structure, and merges their relevant parts with the agents' embeddings. We fuse the encoded map information with further inter-agent interactions only inside the decoder and propose to use explicit training as a method to effectively utilize the information available. We demonstrate the efficacy of our method by testing it on the Argoverse motion forecasting dataset and show its state-of-the-art performance on the public benchmark.