CVLGApr 24, 2025

A Genealogy of Foundation Models in Remote Sensing

arXiv:2504.17177v22 citationsh-index: 5ACM Trans Spat Algorithm Syst
Originality Synthesis-oriented
AI Analysis

It addresses the challenge of effectively leveraging remote sensing data for researchers and practitioners, but it is incremental as it primarily surveys existing methods without introducing new techniques.

This paper reviews the development and application of foundation models in remote sensing, analyzing various approaches and their roots in computer vision to identify advantages, pitfalls, and future directions for improving domain-specific models.

Foundation models have garnered increasing attention for representation learning in remote sensing. Many such foundation models adopt approaches that have demonstrated success in computer vision with minimal domain-specific modification. However, the development and application of foundation models in this field are still burgeoning, as there are a variety of competing approaches for how to most effectively leverage remotely sensed data. This paper examines these approaches, along with their roots in the computer vision field. This is done to characterize potential advantages and pitfalls, while outlining future directions to further improve remote sensing-specific foundation models. We discuss the quality of the learned representations and methods to alleviate the need for massive compute resources. We first examine single-sensor remote foundation models to introduce concepts and provide context, and then place emphasis on incorporating the multi-sensor aspect of Earth observations into foundation models. In particular, we explore the extent to which existing approaches leverage multiple sensors in training foundation models in relation to multi-modal foundation models. Finally, we identify opportunities for further harnessing the vast amounts of unlabeled, seasonal, and multi-sensor remote sensing observations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes