CVMar 25

TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval

arXiv:2603.2474974.5h-index: 7
Predicted impact top 36% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the need for complex retrieval capabilities in real-world applications such as digital forensics and environmental analysis, though it is incremental in improving existing methods.

The paper tackles the problem of jointly reasoning about visual appearance, geolocation, and time for applications like digital forensics and urban monitoring, proposing TIGeR, a unified framework that outperforms baselines by up to 16% in time-of-year prediction and 14% in geo-time aware retrieval recall.

Many real-world applications in digital forensics, urban monitoring, and environmental analysis require jointly reasoning about visual appearance, geolocation, and time. Beyond standard geo-localization and time-of-capture prediction, these applications increasingly demand more complex capabilities, such as retrieving an image captured at the same location as a query image but at a specified target time. We formalize this problem as Geo-Time Aware Image Retrieval and curate a diverse benchmark of 4.5M paired image-location-time triplets for training and 86k high-quality triplets for evaluation. We then propose TIGeR, a multi-modal-transformer-based model that maps image, geolocation, and time into a unified geo-temporal embedding space. TIGeR supports flexible input configurations (single-modality and multi-modality queries) and uses the same representation to perform (i) geo-localization, (ii) time-of-capture prediction, and (iii) geo-time-aware retrieval. By better preserving underlying location identity under large appearance changes, TIGeR enables retrieval based on where and when a scene is, rather than purely on visual similarity. Extensive experiments show that TIGeR consistently outperforms strong baselines and state-of-the-art methods by up to 16% on time-of-year, 8% time-of-day prediction, and 14% in geo-time aware retrieval recall, highlighting the benefits of unified geo-temporal modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes