CVAIOct 1, 2025

GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings

arXiv:2510.01448v1h-index: 30
Originality Incremental advance
AI Analysis

This work addresses the problem of determining geographic locations from images for applications like mapping and navigation, representing an incremental improvement over existing methods.

The paper tackles worldwide visual geo-localization by aligning query images with a hierarchical geographic representation and fusing appearance features with semantic segmentation, achieving improved all-time bests in 22 out of 25 metrics across five benchmark datasets compared to prior SOTA methods and LVLMs.

Worldwide visual geo-localization seeks to determine the geographic location of an image anywhere on Earth using only its visual content. Learned representations of geography for visual geo-localization remain an active research topic despite much progress. We formulate geo-localization as aligning the visual representation of the query image with a learned geographic representation. Our novel geographic representation explicitly models the world as a hierarchy of geographic embeddings. Additionally, we introduce an approach to efficiently fuse the appearance features of the query image with its semantic segmentation map, forming a robust visual representation. Our main experiments demonstrate improved all-time bests in 22 out of 25 metrics measured across five benchmark datasets compared to prior state-of-the-art (SOTA) methods and recent Large Vision-Language Models (LVLMs). Additional ablation studies support the claim that these gains are primarily driven by the combination of geographic and visual representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes