CVAIMay 6, 2025

Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach

arXiv:2505.03299v14 citationsh-index: 3Has Code2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient benchmarking in remote sensing to simplify model selection for new tasks, though it is incremental as it builds on existing foundation model comparisons.

The paper tackles the problem of comparing over 75 remote sensing vision foundation models that lack consistent performance across tasks by proposing a cost-effective method to predict model performance on multiple downstream tasks without fine-tuning, using a 'capabilities encoding' approach.

Foundation models constitute a significant advancement in computer vision: after a single, albeit costly, training phase, they can address a wide array of tasks. In the field of Earth observation, over 75 remote sensing vision foundation models have been developed in the past four years. However, none has consistently outperformed the others across all available downstream tasks. To facilitate their comparison, we propose a cost-effective method for predicting a model's performance on multiple downstream tasks without the need for fine-tuning on each one. This method is based on what we call "capabilities encoding." The utility of this novel approach is twofold: we demonstrate its potential to simplify the selection of a foundation model for a given new task, and we employ it to offer a fresh perspective on the existing literature, suggesting avenues for future research. Codes are available at https://github.com/pierreadorni/capabilities-encoding.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes