CVLGDec 9, 2024

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

arXiv:2412.06781v125 citationsh-index: 10CVPR
Originality Highly original
AI Analysis

It addresses the challenge of predicting image capture locations on Earth for applications in computer vision and geospatial analysis, introducing a novel probabilistic task.

The paper tackles the problem of global visual geolocation by proposing a generative approach using diffusion and Riemannian flow matching to handle ambiguity, achieving state-of-the-art performance on benchmarks like OpenStreetView-5M, YFCC-100M, and iNat21.

Global visual geolocation predicts where an image was captured on Earth. Since images vary in how precisely they can be localized, this task inherently involves a significant degree of ambiguity. However, existing approaches are deterministic and overlook this aspect. In this paper, we aim to close the gap between traditional geolocalization and modern generative methods. We propose the first generative geolocation approach based on diffusion and Riemannian flow matching, where the denoising process operates directly on the Earth's surface. Our model achieves state-of-the-art performance on three visual geolocation benchmarks: OpenStreetView-5M, YFCC-100M, and iNat21. In addition, we introduce the task of probabilistic visual geolocation, where the model predicts a probability distribution over all possible locations instead of a single point. We introduce new metrics and baselines for this task, demonstrating the advantages of our diffusion-based approach. Codes and models will be made available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes