CVMar 11

How To Embed Matters: Evaluation of EO Embedding Design Choices

Luis Gilch, Isabelle Wittmann, Maximilian Nitsche, Johannes Jakubik, Arne Ewald, Thomas Brunschwiler

arXiv:2603.10658v18.2h-index: 31

Predicted impact top 33% in CV · last 90 daysOriginality Synthesis-oriented

AI Analysis

This work addresses the problem of scalable and efficient embedding-based workflows for Earth observation data, offering incremental insights into optimal design choices for practitioners.

The paper systematically analyzes how design choices in embedding extraction from Geospatial Foundation Models affect downstream Earth observation task performance, finding that transformer backbones with mean pooling provide strong default embeddings and that aggregated embeddings can be over 500x smaller than raw input data.

Earth observation (EO) missions produce petabytes of multispectral imagery, increasingly analyzed using large Geospatial Foundation Models (GeoFMs). Alongside end-to-end adaptation, workflows make growing use of intermediate representations as task-agnostic embeddings, enabling models to compute representations once and reuse them across downstream tasks. Consequently, when GeoFMs act as feature extractors, decisions about how representations are obtained, aggregated, and combined affect downstream performance and pipeline scalability. Understanding these trade-offs is essential for scalable embedding-based EO workflows, where compact embeddings can replace raw data while remaining broadly useful. We present a systematic analysis of embedding design in GeoFM-based EO workflows. Leveraging NeuCo-Bench, we study how backbone architecture, pretraining strategy, representation depth, spatial aggregation, and representation combination influence EO task performance. We demonstrate the usability of GeoFM embeddings by aggregating them into fixed-size representations more than 500x smaller than the raw input data. Across models, we find consistent trends: transformer backbones with mean pooling provide strong default embeddings, intermediate ResNet layers can outperform final layers, self-supervised objectives exhibit task-specific strengths, and combining embeddings from different objectives often improves robustness.

View on arXiv PDF

Similar