CVAug 28, 2024

Temporal Attention for Cross-View Sequential Image Localization

arXiv:2408.15569v12.0h-index: 29Has Code

Originality Incremental advance

AI Analysis

It addresses cross-view localization for applications like autonomous navigation by moving beyond one-to-one retrieval to sequential image analysis, though it is incremental in adapting existing datasets.

The paper tackles the problem of fine-grained sequential localization of street-view images within a known satellite image patch, introducing a Temporal Attention Module that reduces mean localization error by 75.3% on adapted datasets.

This paper introduces a novel approach to enhancing cross-view localization, focusing on the fine-grained, sequential localization of street-view images within a single known satellite image patch, a significant departure from traditional one-to-one image retrieval methods. By expanding to sequential image fine-grained localization, our model, equipped with a novel Temporal Attention Module (TAM), leverages contextual information to significantly improve sequential image localization accuracy. Our method shows substantial reductions in both mean and median localization errors on the Cross-View Image Sequence (CVIS) dataset, outperforming current state-of-the-art single-image localization techniques. Additionally, by adapting the KITTI-CVL dataset into sequential image sets, we not only offer a more realistic dataset for future research but also demonstrate our model's robust generalization capabilities across varying times and areas, evidenced by a 75.3% reduction in mean distance error in cross-view sequential image localization.

View on arXiv PDF Code

Similar