CVAILGSep 11, 2023

Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving

arXiv:2309.05282v241 citationsh-index: 14
AI Analysis

This addresses scene understanding for autonomous driving, but it is incremental as it integrates text with existing rasterized image methods.

The paper tackles the problem of representing traffic scenes for autonomous driving by proposing a novel text-based representation processed with a pre-trained language encoder, and shows significant improvements in trajectory prediction on the nuScenes dataset compared to baselines.

In autonomous driving tasks, scene understanding is the first step towards predicting the future behavior of the surrounding traffic participants. Yet, how to represent a given scene and extract its features are still open research questions. In this study, we propose a novel text-based representation of traffic scenes and process it with a pre-trained language encoder. First, we show that text-based representations, combined with classical rasterized image representations, lead to descriptive scene embeddings. Second, we benchmark our predictions on the nuScenes dataset and show significant improvements compared to baselines. Third, we show in an ablation study that a joint encoder of text and rasterized images outperforms the individual encoders confirming that both representations have their complementary strengths.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes