CVAINov 12, 2025

From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance

arXiv:2511.09820v1h-index: 3
Originality Incremental advance
AI Analysis

This addresses the problem of real-world deployment limitations in applications like autonomous navigation and urban planning by eliminating the need for supervised training, though it is incremental as it builds on existing pretrained models.

The paper tackles cross-view image retrieval, specifically street-to-satellite matching, by proposing a training-free framework that uses a pretrained vision encoder and LLM guidance, outperforming prior learning-based methods in zero-shot settings.

Cross-view image retrieval, particularly street-to-satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS-denied environments. However, existing approaches often require supervised training on curated datasets and rely on panoramic or UAV-based images, which limits real-world deployment. In this paper, we present a simple yet effective cross-view image retrieval framework that leverages a pretrained vision encoder and a large language model (LLM), requiring no additional training. Given a monocular street-view image, our method extracts geographic cues through web-based image search and LLM-based location inference, generates a satellite query via geocoding API, and retrieves matching tiles using a pretrained vision encoder (e.g., DINOv2) with PCA-based whitening feature refinement. Despite using no ground-truth supervision or finetuning, our proposed method outperforms prior learning-based approaches on the benchmark dataset under zero-shot settings. Moreover, our pipeline enables automatic construction of semantically aligned street-to-satellite datasets, which is offering a scalable and cost-efficient alternative to manual annotation. All source codes will be made publicly available at https://jeonghomin.github.io/street2orbit.github.io/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes