CVLGMar 2

From Pixels to Patches: Pooling Strategies for Earth Embeddings

arXiv:2603.02080v1h-index: 33
Originality Incremental advance
AI Analysis

This work addresses a practical issue for practitioners using geospatial AI models, offering incremental improvements in pooling strategies to enhance accuracy and generalization in remote sensing tasks.

The paper tackled the problem of aggregating pixel-level embeddings into patch representations for geospatial foundation models, finding that richer pooling schemes like Generalized Mean Pooling reduce the geographic generalization gap by up to 40% and increase accuracy by up to 5% compared to mean pooling.

As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution. The default choice, mean pooling, discards within-patch variability and can drop accuracy by more than 10% under spatial shift. To evaluate this effect, we introduce EuroSAT-Embed: 81,000 embedding GeoTIFFs derived from three foundation models: AlphaEarth, OlmoEarth, and Tessera. We benchmark 11 training-free and 2 parametric pooling methods under both random and geographically disjoint test splits. Our results show that richer pooling schemes reduce the geographic generalization gap by up to 40% relative to mean pooling and increases accuracy by up to 5% on spatial splits. We recommend Generalized Mean Pooling (GeM) as a drop-in replacement for mean pooling: it improves accuracy without increasing embedding dimensionality. For maximum accuracy, Stats pooling (concatenation of min/max/mean/std pooling) performs best at 4x the embedding size. We further find that pooling effectiveness varies across embedding sources and that higher-dimensional embeddings benefit most from distributional statistics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes