CVJan 19

GTPred: Benchmarking MLLMs for Interpretable Geo-localization and Time-of-capture Prediction

arXiv:2601.13207v1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better benchmarks in geo-temporal prediction for researchers and developers working with multi-modal AI systems, though it is incremental in extending existing geo-localization benchmarks.

The authors tackled the problem of geo-temporal prediction by introducing GTPred, a benchmark for evaluating multi-modal large language models on both location and time-of-capture prediction from images, finding that incorporating temporal information significantly enhances location inference performance.

Geo-localization aims to infer the geographic location where an image was captured using observable visual evidence. Traditional methods achieve impressive results through large-scale training on massive image corpora. With the emergence of multi-modal large language models (MLLMs), recent studies have explored their applications in geo-localization, benefiting from improved accuracy and interpretability. However, existing benchmarks largely ignore the temporal information inherent in images, which can further constrain the location. To bridge this gap, we introduce GTPred, a novel benchmark for geo-temporal prediction. GTPred comprises 370 globally distributed images spanning over 120 years. We evaluate MLLM predictions by jointly considering year and hierarchical location sequence matching, and further assess intermediate reasoning chains using meticulously annotated ground-truth reasoning processes. Experiments on 8 proprietary and 7 open-source MLLMs show that, despite strong visual perception, current models remain limited in world knowledge and geo-temporal reasoning. Results also demonstrate that incorporating temporal information significantly enhances location inference performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes