CVAISep 25, 2025

Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms

arXiv:2509.21573v1h-index: 7SIGSPATIAL/GIS
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate global-scale geolocalization for applications like mapping and navigation, but it is incremental as it builds on existing contrastive learning methods by incorporating spatial priors.

The paper tackles the problem of image-based geolocalization by addressing false and hard negatives in contrastive learning, proposing a spatially regularized strategy that integrates a semivariogram to model spatial dependencies, resulting in improved performance on the OSV5M dataset, especially at finer granularity.

Accurate and robust image-based geo-localization at a global scale is challenging due to diverse environments, visually ambiguous scenes, and the lack of distinctive landmarks in many regions. While contrastive learning methods show promising performance by aligning features between street-view images and corresponding locations, they neglect the underlying spatial dependency in the geographic space. As a result, they fail to address the issue of false negatives -- image pairs that are both visually and geographically similar but labeled as negatives, and struggle to effectively distinguish hard negatives, which are visually similar but geographically distant. To address this issue, we propose a novel spatially regularized contrastive learning strategy that integrates a semivariogram, which is a geostatistical tool for modeling how spatial correlation changes with distance. We fit the semivariogram by relating the distance of images in feature space to their geographical distance, capturing the expected visual content in a spatial correlation. With the fitted semivariogram, we define the expected visual dissimilarity at a given spatial distance as reference to identify hard negatives and false negatives. We integrate this strategy into GeoCLIP and evaluate it on the OSV5M dataset, demonstrating that explicitly modeling spatial priors improves image-based geo-localization performance, particularly at finer granularity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes