CVAILGApr 23, 2023

SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models

Cambridge
arXiv:2304.11619v119 citationsh-index: 38
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of robust satellite image classification for applications like land-use planning, but it is incremental as it primarily benchmarks existing methods on a new dataset.

The authors tackled the challenge of classifying diverse satellite imagery by introducing SATIN, a metadataset from 27 existing datasets, and found that the strongest vision-language model achieved only 52.0% accuracy in zero-shot transfer classification.

Interpreting remote sensing imagery enables numerous downstream applications ranging from land-use planning to deforestation monitoring. Robustly classifying this data is challenging due to the Earth's geographic diversity. While many distinct satellite and aerial image classification datasets exist, there is yet to be a benchmark curated that suitably covers this diversity. In this work, we introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets, and comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN. We find SATIN to be a challenging benchmark-the strongest method we evaluate achieves a classification accuracy of 52.0%. We provide a $\href{https://satinbenchmark.github.io}{\text{public leaderboard}}$ to guide and track the progress of VL models in this important domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes