SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
This provides a general-purpose solution for researchers in fields like ecology and epidemiology to efficiently use geographic data, though it is incremental as it builds on existing contrastive learning methods.
The paper tackled the challenge of extracting relevant geographic information for modeling tasks by introducing SatCLIP, a global location encoder that learns representations from satellite imagery, and it improved prediction performance on nine diverse tasks, such as temperature prediction and population density estimation, consistently outperforming alternative encoders.
Geographic information is essential for modeling tasks in fields ranging from ecology to epidemiology. However, extracting relevant location characteristics for a given task can be challenging, often requiring expensive data fusion or distillation from massive global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP). This global, general-purpose geographic location encoder learns an implicit representation of locations by matching CNN and ViT inferred visual patterns of openly available satellite imagery with their geographic coordinates. The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks. In our experiments, we use SatCLIP embeddings to improve prediction performance on nine diverse location-dependent tasks including temperature prediction, animal recognition, and population density estimation. Across tasks, SatCLIP consistently outperforms alternative location encoders and improves geographic generalization by encoding visual similarities of spatially distant environments. These results demonstrate the potential of vision-location models to learn meaningful representations of our planet from the vast, varied, and largely untapped modalities of geospatial data.