CVLGSep 13, 2025

Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation

arXiv:2509.10919v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses accessibility and reuse limitations for Earth Observation downstream tasks through a more efficient model architecture.

The authors tackled the problem of computationally expensive Earth Observation foundation models by proposing a compact Metadata-aware Mixture-of-Experts Masked Autoencoder with only 2.5M parameters, which despite its small size competes with much larger architectures and demonstrates improved transfer and label efficiency.

Recent advances in Earth Observation have focused on large-scale foundation models. However, these models are computationally expensive, limiting their accessibility and reuse for downstream tasks. In this work, we investigate compact architectures as a practical pathway toward smaller general-purpose EO models. We propose a Metadata-aware Mixture-of-Experts Masked Autoencoder (MoE-MAE) with only 2.5M parameters. The model combines sparse expert routing with geo-temporal conditioning, incorporating imagery alongside latitude/longitude and seasonal/daily cyclic encodings. We pretrain the MoE-MAE on the BigEarthNet-Landsat dataset and evaluate embeddings from its frozen encoder using linear probes. Despite its small size, the model competes with much larger architectures, demonstrating that metadata-aware pretraining improves transfer and label efficiency. To further assess generalization, we evaluate on the EuroSAT-Landsat dataset, which lacks explicit metadata, and still observe competitive performance compared to models with hundreds of millions of parameters. These results suggest that compact, metadata-aware MoE-MAEs are an efficient and scalable step toward future EO foundation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes